Re: [ovs-dev] [PATCH RFC] netdev-afxdp: Enable shared umem support.
On 5 Nov 2019, at 23:16, William Tu wrote: The RFC patch enables shared umem support. It requires kernel change and libbpf change, I will post it in another thread. I tested with multiple afxdp ports using skb mode. For example: ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp" options:xdpmode=skb ovs-vsctl -- set interface afxdp-p1 options:n_rxq=1 type="afxdp" options:xdpmode=skb Will share one umem instead of two. Note that once shared umem is created with a specific mode (ex: XDP_COPY), the netdev that shares this umem can not change its mode. So I'm thinking about using just one shared umem for all skb-mode netdevs, and for the rest drv-mode netdevs, keep using dedicated umem. Or create one umem per mode? So the drv-mode netdevs also share one umem. Hi William, Did not go trough the entire patch, but wasn’t Magnus hinting on not using the shared umem ring’s but just the buffers they point too? This way you do not need any locking when doing the ring operations, only when getting buffers. Guess this is more like the mbuf implementation in DPDK. //Eelco Any comments are welcome. Suggested-by: Eelco Chaudron Signed-off-by: William Tu --- lib/netdev-afxdp.c | 97 +++--- 1 file changed, 70 insertions(+), 27 deletions(-) diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index 3037770b27cb..42767e1e27f3 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -95,6 +95,8 @@ static void xsk_destroy(struct xsk_socket_info *xsk); static int xsk_configure_all(struct netdev *netdev); static void xsk_destroy_all(struct netdev *netdev); +static struct xsk_umem_info *shared_umem; + struct unused_pool { struct xsk_umem_info *umem_info; int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */ @@ -112,6 +114,8 @@ struct xsk_umem_info { struct xsk_ring_cons cq; struct xsk_umem *umem; void *buffer; +int refcount; +struct ovs_mutex mutex; }; struct xsk_socket_info { @@ -228,6 +232,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode) uconfig.comp_size = CONS_NUM_DESCS; uconfig.frame_size = FRAME_SIZE; uconfig.frame_headroom = OVS_XDP_HEADROOM; +ovs_mutex_init(>mutex); ret = xsk_umem__create(>umem, buffer, size, >fq, >cq, ); @@ -296,6 +301,7 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, struct xsk_socket_info *xsk; char devname[IF_NAMESIZE]; uint32_t idx = 0, prog_id; +bool shared; int ret; int i; @@ -304,6 +310,7 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, cfg.rx_size = CONS_NUM_DESCS; cfg.tx_size = PROD_NUM_DESCS; cfg.libbpf_flags = 0; +shared = umem->refcount > 1; if (xdpmode == XDP_ZEROCOPY) { cfg.bind_flags = XDP_ZEROCOPY; @@ -319,6 +326,10 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, } #endif +if (shared) { +cfg.bind_flags = XDP_SHARED_UMEM; +} + if (if_indextoname(ifindex, devname) == NULL) { VLOG_ERR("ifindex %d to devname failed (%s)", ifindex, ovs_strerror(errno)); @@ -352,6 +363,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, return NULL; } +if (shared) { +return xsk; +} + +/* Only the first umem needs to go below. */ while (!xsk_ring_prod__reserve(>umem->fq, PROD_NUM_DESCS, )) { VLOG_WARN_RL(, "Retry xsk_ring_prod__reserve to FILL queue"); @@ -380,33 +396,43 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode, { struct xsk_socket_info *xsk; struct xsk_umem_info *umem; -void *bufs; +void *bufs = NULL; netdev_afxdp_sweep_unused_pools(NULL); -/* Umem memory region. */ -bufs = xmalloc_pagealign(NUM_FRAMES * FRAME_SIZE); -memset(bufs, 0, NUM_FRAMES * FRAME_SIZE); +if (!shared_umem) { +/* Umem memory region. */ +bufs = xmalloc_pagealign(NUM_FRAMES * FRAME_SIZE); +memset(bufs, 0, NUM_FRAMES * FRAME_SIZE); + +/* Create AF_XDP socket. */ +umem = xsk_configure_umem(bufs, + NUM_FRAMES * FRAME_SIZE, + xdpmode); +if (!umem) { +free_pagealign(bufs); +return NULL; +} -/* Create AF_XDP socket. */ -umem = xsk_configure_umem(bufs, - NUM_FRAMES * FRAME_SIZE, - xdpmode); -if (!umem) { -free_pagealign(bufs); -return NULL; +shared_umem = umem; +umem->refcount++; +VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem); +} else { +umem = shared_umem; +umem->refcount++; } -VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem); - xsk =
[ovs-dev] [PATCH v3] compat: Add compat fix for old kernels
In kernels older than 4.8, struct tcf_t didn't have the firstuse. If openvswitch is compiled with the compat pkt_cls.h then there is a struct size mismatch between openvswitch and the kernel which cause parsing netlink actions to fail. After this commit parsing the netlink actions pass even if compiled with the compat pkt_cls.h. Signed-off-by: Roi Dayan --- Notes: v2->v3: - use AC_CHECK_MEMBER macro instead of AC_COMPILE_IFELSE v1->v2: - fix mix of tabs and spaces in acinclude.m4 acinclude.m4| 2 ++ include/linux/pkt_cls.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/acinclude.m4 b/acinclude.m4 index a0507cfe019e..bc5b1c667eff 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -185,6 +185,8 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [ [AC_DEFINE([HAVE_TCA_FLOWER_KEY_ENC_IP_TTL_MASK], [1], [Define to 1 if TCA_FLOWER_KEY_ENC_IP_TTL_MASK is available.])]) + AC_CHECK_MEMBERS([struct tcf_t.firstuse], [], [], [#include ]) + AC_COMPILE_IFELSE([ AC_LANG_PROGRAM([#include ], [ int x = TCA_VLAN_PUSH_VLAN_PRIORITY; diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h index 4adea59e7c36..2f7e328c48c8 100644 --- a/include/linux/pkt_cls.h +++ b/include/linux/pkt_cls.h @@ -64,7 +64,9 @@ struct tcf_t { __u64 install; __u64 lastuse; __u64 expires; +#ifdef HAVE_STRUCT_TCF_T_FIRSTUSE __u64 firstuse; +#endif }; #define tc_gen \ -- 2.8.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] compat: Add compat fix for old kernels
On 2019-11-05 11:04 PM, Ben Pfaff wrote: > On Mon, Nov 04, 2019 at 10:11:05AM +0200, Roi Dayan wrote: >> In kernels older than 4.8, struct tcf_t didn't have the firstuse. >> If openvswitch is compiled with the compat pkt_cls.h then there is >> a struct size mismatch between openvswitch and the kernel which cause >> parsing netlink actions to fail. >> After this commit parsing the netlink actions pass even if compiled with >> the compat pkt_cls.h. >> >> Signed-off-by: Roi Dayan >> --- >> acinclude.m4| 8 >> include/linux/pkt_cls.h | 2 ++ >> 2 files changed, 10 insertions(+) >> >> diff --git a/acinclude.m4 b/acinclude.m4 >> index a0507cfe019e..12513b82747d 100644 >> --- a/acinclude.m4 >> +++ b/acinclude.m4 >> @@ -186,6 +186,14 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [ >> [Define to 1 if TCA_FLOWER_KEY_ENC_IP_TTL_MASK is >> available.])]) >> >>AC_COMPILE_IFELSE([ >> +AC_LANG_PROGRAM([#include ], [ >> +struct tcf_t x; >> +x.firstuse = 1; >> +])], >> +[AC_DEFINE([HAVE_TCF_T_FIRSTUSE], [1], >> + [Define to 1 if struct tcf_t have firstuse.])]) >> + > > I think you can use AC_CHECK_MEMBERS: > > AC_CHECK_MEMBERS([struct tcf_t.firstuse], [], [], [#include > ]) > thanks, it works fine. I weren't aware of this macro. i'll send v3. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] same tcp session encapsulated with different udp src port in kernel mode if packet has do ip_forward
On Mon, Nov 4, 2019 at 7:44 PM ychen wrote: > > > > we can easily reproduce this phenomenon by using tcp socket stream sending > from ovs internal port. > > > > > At 2019-10-30 19:49:16, "ychen" wrote: > > Hi, >when we use docker to establish tcp session, we found that the packet > which must do upcall to userspace has different encapsulated udp source port >with packet that only needs do datapath flow forwarding. > > >After some code research and kprobe debug, we found the following: >1. use udp_flow_src_port() to get the port > so when both skb->l4_hash==0 and skb->sw_hash==0, 5 tuple data will > be used to calculate the skb->hash > 2. when first packet of tcp session coming, packet needs do upcall to > userspace, and then ovs_packet_cmd_execute() called > new skb is allocated with both l4_hash and sw_hash set to 0 > 3. when none first packet of tcp sesion coming, function > ovs_dp_process_packet()->ovs_execute_actions() called, > and this time original skb is reserved. > when packet has do ip_forward(), kprobe debug prints skb->l4_hash=1, > sw_hash=0 > 4. we searched kernel code, and found such code: > skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) > { if (sk->sk_txhash) { > skb->l4_hash = 1; > skb->hash = sk->sk_txhash; > } >} > static inline void sk_set_txhash(struct sock *sk) > {sk->sk_txhash = net_tx_rndhash(); ==>it is a random > value!!} >5. so let's have a summary: >when packet is processing only in datapath flow, skb->hash is random > value for the same tcp session? >when packet needs processing first to userspace, than kernel space, > skb->hash is calculated by 5 tuple? > >Our testing enviroment: >debian 9, kernel 4.9.65 >ovs version: 2.8.2 > > >Simple topo is like this: >docker_eth0<---+ > | veth ip_forward > > +host_veth0<->port-eth(ovs-ineternal) > host_veth0 and port-eth device stay in physical host. > > >So can we treat skb->hash as a attribute, when send packet to userspace, > encode this attribute; >and then do ovs_packet_cmd_execute(), retrieve the same hash value from > userspace? > > > another important tips: > if we send packets from qemu based tap device, vxlan source port is always > same for the same tcp session; > only when send packets from docker in which packets will do ip_forward, > vxlan source port may different for same tcp session. Should be fixed. The patch will be sent. > > > > > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2] system-ovn.at: Create IPv6 load balancing tests
Bleep bloop. Greetings Russell Bryant, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line is 184 characters long (recommended limit is 79) #746 FILE: p-set-addresses:1987: tcp,orig=(src=172.16.1.3,dst=30.0.0.1,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.1.3,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 184 characters long (recommended limit is 79) #747 FILE: p-set-addresses:1988: tcp,orig=(src=172.16.1.3,dst=30.0.0.1,sport=,dport=),reply=(src=192.168.2.2,dst=172.16.1.3,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 185 characters long (recommended limit is 79) #763 FILE: p-set-addresses:1995: tcp,orig=(src=172.16.1.3,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=20.0.0.2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 185 characters long (recommended limit is 79) #764 FILE: p-set-addresses:1996: tcp,orig=(src=172.16.1.3,dst=192.168.2.2,sport=,dport=),reply=(src=192.168.2.2,dst=20.0.0.2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 101 characters long (recommended limit is 79) #920 FILE: p-set-addresses:2156: NS_CHECK_EXEC([alice1], [wget http://[[fd30::1]] -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) WARNING: Line is 173 characters long (recommended limit is 79) #929 FILE: p-set-addresses:2162: tcp,orig=(src=fd72::3,dst=fd30::1,sport=,dport=),reply=(src=fd11::2,dst=fd72::3,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 173 characters long (recommended limit is 79) #930 FILE: p-set-addresses:2163: tcp,orig=(src=fd72::3,dst=fd30::1,sport=,dport=),reply=(src=fd12::2,dst=fd72::3,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 173 characters long (recommended limit is 79) #939 FILE: p-set-addresses:2169: tcp,orig=(src=fd72::3,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd20::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 173 characters long (recommended limit is 79) #940 FILE: p-set-addresses:2170: tcp,orig=(src=fd72::3,dst=fd12::2,sport=,dport=),reply=(src=fd12::2,dst=fd20::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 89 characters long (recommended limit is 79) #960 FILE: p-set-addresses:2337: -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ WARNING: Line is 102 characters long (recommended limit is 79) #1036 FILE: p-set-addresses:2413: ovn-nbctl set load_balancer $uuid vips:'"[[fd72::11]]:8000"'='"@<:@fd01::2@:>@:80,@<:@fd02::2@:>@:80"' WARNING: Line is 102 characters long (recommended limit is 79) #1050 FILE: p-set-addresses:2427: NS_CHECK_EXEC([alice1], [wget http://[[fd72::10]] -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) WARNING: Line is 80 characters long (recommended limit is 79) #1054 FILE: p-set-addresses:2431: AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd72::10) | grep -v fe80 | WARNING: Line is 174 characters long (recommended limit is 79) #1056 FILE: p-set-addresses:2433: tcp,orig=(src=fd72::2,dst=fd72::10,sport=,dport=),reply=(src=fd01::2,dst=fd72::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 174 characters long (recommended limit is 79) #1057 FILE: p-set-addresses:2434: tcp,orig=(src=fd72::2,dst=fd72::10,sport=,dport=),reply=(src=fd02::2,dst=fd72::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 107 characters long (recommended limit is 79) #1063 FILE: p-set-addresses:2440: NS_CHECK_EXEC([alice1], [wget http://[[fd72::11]]:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) WARNING: Line is 80 characters long (recommended limit is 79) #1067 FILE: p-set-addresses:2444: AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd72::11) | grep -v fe80 | WARNING: Line is 174 characters long (recommended limit is 79) #1069 FILE: p-set-addresses:2446: tcp,orig=(src=fd72::2,dst=fd72::11,sport=,dport=),reply=(src=fd01::2,dst=fd72::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 174 characters long (recommended limit is 79) #1070 FILE: p-set-addresses:2447: tcp,orig=(src=fd72::2,dst=fd72::11,sport=,dport=),reply=(src=fd02::2,dst=fd72::2,sport=,dport=),zone=,protoinfo=(state=) Lines checked: 1094, Warnings: 19, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [OVN][RAFT]Why left server cannot be added back to
Hi,Ben, I am agreed that ,under normal circumstances, there is no need to allow left server back to origin server . But I met some strange situation. For example ,sometimes, one server may be with two diffrent UUIDs recorded by Leader server. I think if the left server could be back to origin server, it will be a easy solution to solve many strange problems. Thanks, Yun At 2019-11-06 05:31:25, "Ben Pfaff" wrote: >On Tue, Nov 05, 2019 at 08:10:41PM +0800, taoyunupt wrote: >> Hi,Numan, >> When I run OVN/RAFT cluster, I found a server(which >> initiative to leave or be kicked off ) ,cannot be added back to origin >> cluster. I found the code as following, can you tell me the reason , many >> thanks! >> >> >> case RAFT_REC_NOTE: >> if (!strcmp(r->note, "left")) { >> return ovsdb_error(NULL, "record %llu indicates server has left " >>"the cluster; it cannot be added back (use " >>"\"ovsdb-tool join-cluster\" to add a new " >>"server)", rec_idx); > >The Raft dissertation doesn't contemplate the possibility of a server >re-joining a cluster. Allowing it would add new corner cases that >aren't worth dealing with. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] can OVS conntrack support IP list like this: actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7, 220.0.0.123))?
On Tue, Nov 5, 2019 at 5:37 PM Darrell Ball wrote: > > > On Tue, Nov 5, 2019 at 4:32 PM Yi Yang (杨燚)-云服务集团 > wrote: > >> Hi, folks >> >> >> >> We need to do SNAT for many internal IPs by just using several public IPs, >> we also need to do DNAT by some other public IPs for exposing webservice, >> openflow rules look like the below: >> >> >> >> table=0,ip,nw_src=172.17.0.0/16, >> …,actions=ct(commit,table=0,zone=1,nat(src= >> 220.0.0.3,220.0.0.7,220.0.0.123)) >> >> >> table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=22 >> 0.0.0.3,220.0.0.7,220.0.0.123)) >> >> > for snat, you can map some subset of private IPs to a given public IP and > so on > > > >> >> table=0,ip,tcp,nw_dst=220.0.0.11,tp_dst=80,…,actions=ct(commit,table=0,zone >> =2,nat(dst=172.16.0.100:80)) >> >> table=0,ip,tcp,nw_dst=220.0.0.11, >> tp_dst=443,…,actions=ct(commit,table=0,zone=2,nat(dst=172.16.0.100:443)) >> > > you are mapping 'to' private IPs, so you have control over the range > > >> >> >> >> >> From ct document, it seems it can’t support IP list for nat, anybody knows >> how we can handle such cases in some kind feasible way? >> >> >> >> In addition, is it ok if multiple openflow rules use the same NAT IP:PORT >> combination? I’m not sure if it will result in some conflicts for SNAT, >> because all of them need to do dynamic source port mapping, per my test, >> it >> seems this isn’t a problem. >> > > IIUC, as long as tuples are unique, it should be fine > probably, you should give an example of what you mean by above I am not sure you are meaning to say that you want to specify an L4 port in your snat action rule or not; you will want to use ephemeral ports by not specifying a specific port in most cases > > >> >> >> >> Thank you all in advance and appreciate your help sincerely. >> >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] can OVS conntrack support IP list like this: actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7, 220.0.0.123))?
On Tue, Nov 5, 2019 at 4:32 PM Yi Yang (杨燚)-云服务集团 wrote: > Hi, folks > > > > We need to do SNAT for many internal IPs by just using several public IPs, > we also need to do DNAT by some other public IPs for exposing webservice, > openflow rules look like the below: > > > > table=0,ip,nw_src=172.17.0.0/16, > …,actions=ct(commit,table=0,zone=1,nat(src= > 220.0.0.3,220.0.0.7,220.0.0.123)) > > table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=22 > 0.0.0.3,220.0.0.7,220.0.0.123)) > > for snat, you can map some subset of private IPs to a given public IP and so on > table=0,ip,tcp,nw_dst=220.0.0.11,tp_dst=80,…,actions=ct(commit,table=0,zone > =2,nat(dst=172.16.0.100:80)) > > table=0,ip,tcp,nw_dst=220.0.0.11, > tp_dst=443,…,actions=ct(commit,table=0,zone=2,nat(dst=172.16.0.100:443)) > you are mapping 'to' private IPs, so you have control over the range > > > > > From ct document, it seems it can’t support IP list for nat, anybody knows > how we can handle such cases in some kind feasible way? > > > > In addition, is it ok if multiple openflow rules use the same NAT IP:PORT > combination? I’m not sure if it will result in some conflicts for SNAT, > because all of them need to do dynamic source port mapping, per my test, it > seems this isn’t a problem. > IIUC, as long as tuples are unique, it should be fine > > > > Thank you all in advance and appreciate your help sincerely. > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2] system-ovn.at: Create IPv6 load balancing tests
Duplicate all of the IPv4 load balancing test cases for IPv6. All of these are passing without any changes needed in OVN code, but this will help ensure that we do not have any IPv6 load balancing regressions in the future. Signed-off-by: Russell Bryant --- tests/system-ovn.at | 876 1 file changed, 799 insertions(+), 77 deletions(-) v1 -> v2: - Use [[ and ]] instead of quadigraphs to enhance readability, as suggested by Ben Pfaff diff --git a/tests/system-ovn.at b/tests/system-ovn.at index b3f90aae2..5885df58e 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -1158,6 +1158,153 @@ tcp,orig=(src=192.168.1.2,dst=30.0.0.2,sport=,dport=),reply=(s ]) +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d"]) +AT_CLEANUP + +AT_SETUP([ovn -- load-balancing - IPv6]) +AT_KEYWORDS([ovnlb]) + +CHECK_CONNTRACK() +CHECK_CONNTRACK_NAT() +ovn_start +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-int]) + +# Set external-ids in br-int needed for ovn-controller +ovs-vsctl \ +-- set Open_vSwitch . external-ids:system-id=hv1 \ +-- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ +-- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ +-- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ +-- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller + +# Logical network: +# 2 logical switches "foo" (fd01::/64) and "bar" (fd02::/64) +# connected to a router R1. +# foo has foo1 to act as a client. +# bar has bar1, bar2, bar3 to act as servers. +# +# Loadbalancer VIPs in fd03::/64 network. + +ovn-nbctl create Logical_Router name=R1 +ovn-nbctl ls-add foo +ovn-nbctl ls-add bar + +# Connect foo to R1 +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 fd01::1/64 +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ +type=router options:router-port=foo addresses=\"00:00:01:01:02:03\" + +# Connect bar to R1 +ovn-nbctl lrp-add R1 bar 00:00:01:01:02:04 fd02::1/64 +ovn-nbctl lsp-add bar rp-bar -- set Logical_Switch_Port rp-bar \ +type=router options:router-port=bar addresses=\"00:00:01:01:02:04\" + +# Create logical port 'foo1' in switch 'foo'. +ADD_NAMESPACES(foo1) +ADD_VETH(foo1, foo1, br-int, "fd01::2/64", "f0:00:00:01:02:03", \ + "fd01::1") +ovn-nbctl lsp-add foo foo1 \ +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 fd01::2" + +# Create logical ports 'bar1', 'bar2', 'bar3' in switch 'bar'. +ADD_NAMESPACES(bar1) +ADD_VETH(bar1, bar1, br-int, "fd02::2/64", "f0:00:0f:01:02:03", \ + "fd02::1") +ovn-nbctl lsp-add bar bar1 \ +-- lsp-set-addresses bar1 "f0:00:0f:01:02:03 fd02::2" + +ADD_NAMESPACES(bar2) +ADD_VETH(bar2, bar2, br-int, "fd02::3/64", "f0:00:0f:01:02:04", \ + "fd02::1") +ovn-nbctl lsp-add bar bar2 \ +-- lsp-set-addresses bar2 "f0:00:0f:01:02:04 fd02::3" + +ADD_NAMESPACES(bar3) +ADD_VETH(bar3, bar3, br-int, "fd02::4/64", "f0:00:0f:01:02:05", \ + "fd02::1") +ovn-nbctl lsp-add bar bar3 \ +-- lsp-set-addresses bar3 "f0:00:0f:01:02:05 fd02::4" + +# Config OVN load-balancer with a VIP. +uuid=`ovn-nbctl create load_balancer vips:\"fd03::1\"=\"fd02::2,fd02::3,fd02::4\"` +ovn-nbctl set logical_switch foo load_balancer=$uuid + +# Create another load-balancer with another VIP. +uuid=`ovn-nbctl create load_balancer vips:\"fd03::3\"=\"fd02::2,fd02::3,fd02::4\"` +ovn-nbctl add logical_switch foo load_balancer $uuid + +# Config OVN load-balancer with another VIP (this time with ports). +ovn-nbctl set load_balancer $uuid vips:'"[[fd03::2]]:8000"'='"@<:@fd02::2@:>@:80,@<:@fd02::3@:>@:80,@<:@fd02::4@:>@:80"' + +# Wait for ovn-controller to catch up. +ovn-nbctl --wait=hv sync +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \ +grep 'nat(dst=\[[fd02::4\]]:80)']) + +# Start webservers in 'bar1', 'bar2' and 'bar3'. +OVS_START_L7([bar1], [http6]) +OVS_START_L7([bar2], [http6]) +OVS_START_L7([bar3], [http6]) + +dnl Should work with the virtual IP fd03::1 address through NAT +for i in `seq 1 20`; do +echo Request $i +NS_CHECK_EXEC([foo1], [wget http://[[fd03::1]] -t 5 -T 1 --retry-connrefused -v -o wget$i.log || (ovs-ofctl -O OpenFlow13 dump-flows br-int && false)]) +done + +dnl Each server should have at least one connection. +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd03::1) | grep -v fe80 | \ +sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl +tcp,orig=(src=fd01::2,dst=fd03::1,sport=,dport=),reply=(src=fd02::2,dst=fd01::2,sport=,dport=),zone=,protoinfo=(state=) +tcp,orig=(src=fd01::2,dst=fd03::1,sport=,dport=),reply=(src=fd02::3,dst=fd01::2,sport=,dport=),zone=,protoinfo=(state=)
Re: [ovs-dev] [PATCH] compat: Add missing inline keyword
On 11/5/2019 3:14 PM, Ben Pfaff wrote: On Tue, Nov 05, 2019 at 02:14:24PM -0800, Greg Rose wrote: The missing inline keyword before the definition of the rpl_nf_ct_tmpl_free() function causes spurious warnings about the function not being used on some older kernels. Add the keyword to suppress the warning. Signed-off-by: Greg Rose The commit message and the patch make sense given my experience of GCC behavior. I haven't tested it. Acked-by: Ben Pfaff I only did a compile test of it on Travis. https://travis-ci.org/gvrose8192/ovs-experimental/builds/607794710 There's one error but it's a resource or infrastructure issue I think. Thanks, - Greg ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv4] ofproto-dpif: Expose datapath capability to ovsdb.
On Tue, Nov 05, 2019 at 02:57:46PM -0800, William Tu wrote: > +/* ODP_SUPPORT_FIELDS */ > +str_value = xasprintf("%"PRIuSIZE, odp.max_vlan_headers); > +smap_add(cap, "max_vlan_headers", str_value); > +free(str_value); I think that you can shorten the above to: smap_add_format(cap, "max_vlan_headers", "%"PRIuSIZE, odp.max_vlan_headers); and similarly for other cases. I think that we can improve the documentation. I'm working on a suggestion for that, please give me a while to write it up. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] can OVS conntrack support IP list like this: actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7, 220.0.0.123))?
Hi, folks We need to do SNAT for many internal IPs by just using several public IPs, we also need to do DNAT by some other public IPs for exposing webservice, openflow rules look like the below: table=0,ip,nw_src=172.17.0.0/16,…,actions=ct(commit,table=0,zone=1,nat(src= 220.0.0.3,220.0.0.7,220.0.0.123)) table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=22 0.0.0.3,220.0.0.7,220.0.0.123)) table=0,ip,tcp,nw_dst=220.0.0.11,tp_dst=80,…,actions=ct(commit,table=0,zone =2,nat(dst=172.16.0.100:80)) table=0,ip,tcp,nw_dst=220.0.0.11, tp_dst=443,…,actions=ct(commit,table=0,zone=2,nat(dst=172.16.0.100:443)) >From ct document, it seems it can’t support IP list for nat, anybody knows how we can handle such cases in some kind feasible way? In addition, is it ok if multiple openflow rules use the same NAT IP:PORT combination? I’m not sure if it will result in some conflicts for SNAT, because all of them need to do dynamic source port mapping, per my test, it seems this isn’t a problem. Thank you all in advance and appreciate your help sincerely. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next] xsk: Enable shared umem support.
Currently the shared umem feature is not supported in libbpf. The patch removes the refcount check in libbpf to enable use of shared umem. Also, a umem can be shared by multiple netdevs, so remove the checking at xsk_bind. Tested using OVS at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-November/364392.html Signed-off-by: William Tu --- net/xdp/xsk.c | 5 - tools/lib/bpf/xsk.c | 10 +- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 6040bc2b0088..0f2b16e275e3 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -697,11 +697,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) sockfd_put(sock); goto out_unlock; } - if (umem_xs->dev != dev || umem_xs->queue_id != qid) { - err = -EINVAL; - sockfd_put(sock); - goto out_unlock; - } xdp_get_umem(umem_xs->umem); WRITE_ONCE(xs->umem, umem_xs->umem); diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index 74d84f36a5b2..e6c4eb077dcd 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -579,16 +579,13 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, struct sockaddr_xdp sxdp = {}; struct xdp_mmap_offsets off; struct xsk_socket *xsk; + bool shared; int err; if (!umem || !xsk_ptr || !rx || !tx) return -EFAULT; - if (umem->refcount) { - pr_warn("Error: shared umems not supported by libbpf.\n"); - return -EBUSY; - } - + shared = !!(usr_config->bind_flags & XDP_SHARED_UMEM); xsk = calloc(1, sizeof(*xsk)); if (!xsk) return -ENOMEM; @@ -687,6 +684,9 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, sxdp.sxdp_queue_id = xsk->queue_id; sxdp.sxdp_flags = xsk->config.bind_flags; + if (shared) + sxdp.sxdp_shared_umem_fd = umem->fd; + err = bind(xsk->fd, (struct sockaddr *), sizeof(sxdp)); if (err) { err = -errno; -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] compat: Add missing inline keyword
On Tue, Nov 05, 2019 at 02:14:24PM -0800, Greg Rose wrote: > The missing inline keyword before the definition of the > rpl_nf_ct_tmpl_free() function causes spurious warnings about the > function not being used on some older kernels. Add the keyword > to suppress the warning. > > Signed-off-by: Greg Rose The commit message and the patch make sense given my experience of GCC behavior. I haven't tested it. Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] vswitch.xml: Fix column for xdpmode.
On Tue, Nov 05, 2019 at 01:20:35PM -0800, Ben Pfaff wrote: > On Tue, Nov 05, 2019 at 09:54:08PM +0100, Ilya Maximets wrote: > > 'xdpmode' is part of 'options', not the 'other_config'. > > > > CC: William Tu > > Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.") > > Signed-off-by: Ilya Maximets > > It's not clear to me whether the options related to XDP and DPDK and PMD > consistently follow the intended convention. The intention is that > options is for settings that are specific to a particular configured > interface type, and other-config is for options independent of the > interface type. > > Acked-by: Ben Pfaff Thank you, I applied to master and backport to 2.12. William > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCHv4] ofproto-dpif: Expose datapath capability to ovsdb.
The patch adds support for fetching the datapath's capabilities from the result of 'check_support()', and write the supported capability to a new database column, called 'capabilities' under Datapath table. To see how it works, run: # ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev # ovs-vsctl -- --id=@m create Datapath datapath_version=0 \ 'ct_zones={}' 'capabilities={}' \ -- set Open_vSwitch . datapaths:"netdev"=@m # ovs-vsctl list-dp-cap netdev ufid=true sample_nesting=true clone=true tnl_push_pop=true \ ct_orig_tuple=true ct_eventmask=true ct_state=true \ ct_clear=true max_vlan_headers=1 recirc=true ct_label=true \ max_hash_alg=1 ct_state_nat=true ct_timeout=true \ ct_mark=true ct_orig_tuple6=true check_pkt_len=true \ masked_set_action=true max_mpls_depth=3 trunc=true ct_zone=true Signed-off-by: William Tu Tested-by: Greg Rose Reviewed-by: Greg Rose --- v4: rebase to master v3: fix 32-bit build, reported by Greg travis: https://travis-ci.org/williamtu/ovs-travis/builds/599276267 v2: rebase to master --- ofproto/ofproto-dpif.c | 51 +++ ofproto/ofproto-provider.h | 2 ++ ofproto/ofproto.c | 12 ofproto/ofproto.h | 2 ++ tests/ovs-vsctl.at | 10 ++ utilities/ovs-vsctl.8.in | 6 utilities/ovs-vsctl.c | 28 + vswitchd/bridge.c | 21 + vswitchd/vswitch.ovsschema | 5 ++- vswitchd/vswitch.xml | 76 ++ 10 files changed, 212 insertions(+), 1 deletion(-) diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index c35ec3e61592..fa73d06a82f4 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -5444,6 +5444,56 @@ ct_del_zone_timeout_policy(const char *datapath_type, uint16_t zone_id) } } +static void +get_datapath_cap(const char *datapath_type, struct smap *cap) +{ +char *str_value; +struct odp_support odp; +struct dpif_backer_support s; +struct dpif_backer *backer = shash_find_data(_dpif_backers, + datapath_type); +if (!backer) { +return; +} +s = backer->rt_support; +odp = s.odp; + +/* ODP_SUPPORT_FIELDS */ +str_value = xasprintf("%"PRIuSIZE, odp.max_vlan_headers); +smap_add(cap, "max_vlan_headers", str_value); +free(str_value); + +str_value = xasprintf("%"PRIuSIZE, odp.max_mpls_depth); +smap_add(cap, "max_mpls_depth", str_value); +free(str_value); + +smap_add(cap, "recirc", odp.recirc ? "true" : "false"); +smap_add(cap, "ct_state", odp.ct_state ? "true" : "false"); +smap_add(cap, "ct_zone", odp.ct_zone ? "true" : "false"); +smap_add(cap, "ct_mark", odp.ct_mark ? "true" : "false"); +smap_add(cap, "ct_label", odp.ct_label ? "true" : "false"); +smap_add(cap, "ct_state_nat", odp.ct_state_nat ? "true" : "false"); +smap_add(cap, "ct_orig_tuple", odp.ct_orig_tuple ? "true" : "false"); +smap_add(cap, "ct_orig_tuple6", odp.ct_orig_tuple6 ? "true" : "false"); + +/* DPIF_SUPPORT_FIELDS */ +smap_add(cap, "masked_set_action", s.masked_set_action ? "true" : "false"); +smap_add(cap, "tnl_push_pop", s.tnl_push_pop ? "true" : "false"); +smap_add(cap, "ufid", s.ufid ? "true" : "false"); +smap_add(cap, "trunc", s.trunc ? "true" : "false"); +smap_add(cap, "clone", s.clone ? "true" : "false"); +smap_add(cap, "sample_nesting", s.sample_nesting ? "true" : "false"); +smap_add(cap, "ct_eventmask", s.ct_eventmask ? "true" : "false"); +smap_add(cap, "ct_clear", s.ct_clear ? "true" : "false"); + +str_value = xasprintf("%"PRIuSIZE, s.max_hash_alg); +smap_add(cap, "max_hash_alg", str_value); +free(str_value); + +smap_add(cap, "check_pkt_len", s.check_pkt_len ? "true" : "false"); +smap_add(cap, "ct_timeout", s.ct_timeout ? "true" : "false"); +} + /* Gets timeout policy name in 'backer' based on 'zone', 'dl_type' and * 'nw_proto'. Returns true if the zone-based timeout policy is configured. * On success, stores the timeout policy name in 'tp_name', and sets @@ -6585,4 +6635,5 @@ const struct ofproto_class ofproto_dpif_class = { ct_flush, /* ct_flush */ ct_set_zone_timeout_policy, ct_del_zone_timeout_policy, +get_datapath_cap, }; diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h index c980e6b5..80c4fee06d01 100644 --- a/ofproto/ofproto-provider.h +++ b/ofproto/ofproto-provider.h @@ -1889,6 +1889,8 @@ struct ofproto_class { /* Deletes the timeout policy associated with 'zone' in datapath type * 'dp_type'. */ void (*ct_del_zone_timeout_policy)(const char *dp_type, uint16_t zone); +/* Get the datapath's capabilities. */ +void (*get_datapath_cap)(const char *dp_type, struct smap *caps); }; extern const struct ofproto_class ofproto_dpif_class; diff --git a/ofproto/ofproto.c
[ovs-dev] [PATCH RFC] netdev-afxdp: Enable shared umem support.
The RFC patch enables shared umem support. It requires kernel change and libbpf change, I will post it in another thread. I tested with multiple afxdp ports using skb mode. For example: ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp" options:xdpmode=skb ovs-vsctl -- set interface afxdp-p1 options:n_rxq=1 type="afxdp" options:xdpmode=skb Will share one umem instead of two. Note that once shared umem is created with a specific mode (ex: XDP_COPY), the netdev that shares this umem can not change its mode. So I'm thinking about using just one shared umem for all skb-mode netdevs, and for the rest drv-mode netdevs, keep using dedicated umem. Or create one umem per mode? So the drv-mode netdevs also share one umem. Any comments are welcome. Suggested-by: Eelco Chaudron Signed-off-by: William Tu --- lib/netdev-afxdp.c | 97 +++--- 1 file changed, 70 insertions(+), 27 deletions(-) diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index 3037770b27cb..42767e1e27f3 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -95,6 +95,8 @@ static void xsk_destroy(struct xsk_socket_info *xsk); static int xsk_configure_all(struct netdev *netdev); static void xsk_destroy_all(struct netdev *netdev); +static struct xsk_umem_info *shared_umem; + struct unused_pool { struct xsk_umem_info *umem_info; int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */ @@ -112,6 +114,8 @@ struct xsk_umem_info { struct xsk_ring_cons cq; struct xsk_umem *umem; void *buffer; +int refcount; +struct ovs_mutex mutex; }; struct xsk_socket_info { @@ -228,6 +232,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode) uconfig.comp_size = CONS_NUM_DESCS; uconfig.frame_size = FRAME_SIZE; uconfig.frame_headroom = OVS_XDP_HEADROOM; +ovs_mutex_init(>mutex); ret = xsk_umem__create(>umem, buffer, size, >fq, >cq, ); @@ -296,6 +301,7 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, struct xsk_socket_info *xsk; char devname[IF_NAMESIZE]; uint32_t idx = 0, prog_id; +bool shared; int ret; int i; @@ -304,6 +310,7 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, cfg.rx_size = CONS_NUM_DESCS; cfg.tx_size = PROD_NUM_DESCS; cfg.libbpf_flags = 0; +shared = umem->refcount > 1; if (xdpmode == XDP_ZEROCOPY) { cfg.bind_flags = XDP_ZEROCOPY; @@ -319,6 +326,10 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, } #endif +if (shared) { +cfg.bind_flags = XDP_SHARED_UMEM; +} + if (if_indextoname(ifindex, devname) == NULL) { VLOG_ERR("ifindex %d to devname failed (%s)", ifindex, ovs_strerror(errno)); @@ -352,6 +363,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, return NULL; } +if (shared) { +return xsk; +} + +/* Only the first umem needs to go below. */ while (!xsk_ring_prod__reserve(>umem->fq, PROD_NUM_DESCS, )) { VLOG_WARN_RL(, "Retry xsk_ring_prod__reserve to FILL queue"); @@ -380,33 +396,43 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode, { struct xsk_socket_info *xsk; struct xsk_umem_info *umem; -void *bufs; +void *bufs = NULL; netdev_afxdp_sweep_unused_pools(NULL); -/* Umem memory region. */ -bufs = xmalloc_pagealign(NUM_FRAMES * FRAME_SIZE); -memset(bufs, 0, NUM_FRAMES * FRAME_SIZE); +if (!shared_umem) { +/* Umem memory region. */ +bufs = xmalloc_pagealign(NUM_FRAMES * FRAME_SIZE); +memset(bufs, 0, NUM_FRAMES * FRAME_SIZE); + +/* Create AF_XDP socket. */ +umem = xsk_configure_umem(bufs, + NUM_FRAMES * FRAME_SIZE, + xdpmode); +if (!umem) { +free_pagealign(bufs); +return NULL; +} -/* Create AF_XDP socket. */ -umem = xsk_configure_umem(bufs, - NUM_FRAMES * FRAME_SIZE, - xdpmode); -if (!umem) { -free_pagealign(bufs); -return NULL; +shared_umem = umem; +umem->refcount++; +VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem); +} else { +umem = shared_umem; +umem->refcount++; } -VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem); - xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode, use_need_wakeup); -if (!xsk) { +if (!xsk && !umem->refcount--) { /* Clean up umem and xpacket pool. */ +shared_umem = NULL; if (xsk_umem__delete(umem->umem)) { VLOG_ERR("xsk_umem__delete failed."); } -free_pagealign(bufs); +
[ovs-dev] [PATCH] compat: Add missing inline keyword
The missing inline keyword before the definition of the rpl_nf_ct_tmpl_free() function causes spurious warnings about the function not being used on some older kernels. Add the keyword to suppress the warning. Signed-off-by: Greg Rose --- datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h index bb3d794..4cce92f 100644 --- a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h +++ b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h @@ -33,7 +33,7 @@ out_free: } #define nf_ct_tmpl_alloc rpl_nf_ct_tmpl_alloc -static void rpl_nf_ct_tmpl_free(struct nf_conn *tmpl) +static inline void rpl_nf_ct_tmpl_free(struct nf_conn *tmpl) { nf_ct_ext_destroy(tmpl); nf_ct_ext_free(tmpl); -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [OVN][RAFT]Why left server cannot be added back to
On Tue, Nov 05, 2019 at 08:10:41PM +0800, taoyunupt wrote: > Hi,Numan, > When I run OVN/RAFT cluster, I found a server(which > initiative to leave or be kicked off ) ,cannot be added back to origin > cluster. I found the code as following, can you tell me the reason , many > thanks! > > > case RAFT_REC_NOTE: > if (!strcmp(r->note, "left")) { > return ovsdb_error(NULL, "record %llu indicates server has left " >"the cluster; it cannot be added back (use " >"\"ovsdb-tool join-cluster\" to add a new " >"server)", rec_idx); The Raft dissertation doesn't contemplate the possibility of a server re-joining a cluster. Allowing it would add new corner cases that aren't worth dealing with. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2] ip_gre: Remove even more unused code
On 11/5/2019 1:14 PM, Ben Pfaff wrote: On Mon, Nov 04, 2019 at 12:43:47PM -0800, Greg Rose wrote: There is a confusing mix of ipgre and gretap functions with some needed for gretap still having ipgre_ prefixes. This time though I think I got the rest of the unused ipgre code. Passes Travis here and this time I made sure the patch passing Travis is the same one I'm mailing. https://travis-ci.org/gvrose8192/ovs-experimental/builds/607296133 Fixes: d5822f428814 ("gre: Remove dead ipgre code") Signed-off-by: Greg Rose --- V2 - Send the right patch this time Great. Thanks, I applied this to master. Thanks Ben! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC v2] bridge: Allow manual notifications about interfaces' updates.
On Tue, Nov 05, 2019 at 09:40:26PM +0100, Ilya Maximets wrote: > On 05.11.2019 20:20, Ben Pfaff wrote: > > On Mon, Nov 04, 2019 at 02:22:34PM +0100, Ilya Maximets wrote: > > > Sometimes interface updates could happen in a way ifnotifier is not > > > able to catch. For example some heavy operations (device reset) in > > > netdev-dpdk could require re-applying of the bridge configuration. > > > > > > For this purpose new manual notifier introduced. Its function > > > 'if_notifier_manual_report()' could be called directly by the code > > > that aware about changes. This new notifier is thread safe. > > > > > > Signed-off-by: Ilya Maximets > > > --- > > > > > > Sending this as RFC that might be useful in context of the following > > > patch: > > > https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364155.html > > > > > > It will allow us to not introduce heavy dpdk notifier just to update > > > one sequence number on RTE_ETH_EVENT_INTR_RESET events. We could also > > > use it to report other changes from DPDK, e.g. LSC interrupts. > > > > I *think* I understand what's going on here. It's that the proposed > > dpdk-notifier > > (https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364154.html) > > takes more locks, etc., so it's going to be slow. This one doesn't so > > it's cheaper. This one, however, will only work if the OVS code that > > makes the device changes calls into it, whereas the other one gets > > notified by DPDK itself. > > Both implementations doesn't receive notifications from DPDK itself. > dpdk-notifier is just a more complex variant of the same thing with > dynamic allocations and list of notifiers. In practice, the only > way to trigger it is to call dpdk_notifierr_report_link() from the > OVS code. > > The part that really receives DPDK events is 'dpdk_eth_event_callback()' > from https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364155.html > > There is a possibility to make everything super right (not sure). > This should look like this: > > * Allow more than one type of notifiers. i.e. introduce > ifnotifier-provider.h, make a common interface in ifnotifier.{c,h}. > Move code of the existing notifiers to ifnotifier-{linux,bsd}.{c.h} > and make them both implement ifnotifier with type='system'. > > * Introduce dpdk_notifier (analogue of rtnetlink_notifier) that will > subscribe itself to DPDK events with rte_eth_dev_callback_register() > and receive actual DPDK events via callback. > > * Introduce ifnotifier-dpdk.{c,h} that will register itself as ifnotifier > with type='dpdk'. Subscribe ifnotifier-dpdk to dpdk_notifier with > dpdk_notifier_create(). > > * bridge.c should subscribe itself to all registered types of ifnotifiers > to increase 'ifaces_changed' sequence number. > > * netdev-dpdk.c should subscribe to dpdk_notifier to receive DPDK > events and perform required actions on devices. > > * Something else to resolve issues with DPDK intialization. > (we can't register dpdk callbacks before dpdk initialization that > could happen way after bridge initialization) > > I'm not sure if it's really possible to properly implement, but it looks > like an over-engineering anyway. > > BTW, I'm not sure if any of our solutions are good enough. OK. Thanks for all the details. Given that we have three solutions, of which it's possible none may be "good enough", I favor the simplest (the "manual" one). > > I think that this one could avoid introducing a mutex by using an atomic > > pointer, but I don't know whether that's worthwhile. > > > > The case is that we need to be sure that 'ifaces_changed' sequence number > will not be destroyed while (before, actually) we're using it. > >Main thread Notification thread > >create seq >set cb <-- func > notification appeared > func = read cb > func() >set cb <-- NULL >destroy seq > seq_change() // change of destroyed seq will lead to > crash. > > > So, the mutex ensures that callback is still registered while we're > executing it. Ah, OK. That makes sense. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] vswitch.xml: Fix column for xdpmode.
On Tue, Nov 05, 2019 at 09:54:08PM +0100, Ilya Maximets wrote: > 'xdpmode' is part of 'options', not the 'other_config'. > > CC: William Tu > Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.") > Signed-off-by: Ilya Maximets It's not clear to me whether the options related to XDP and DPDK and PMD consistently follow the intended convention. The intention is that options is for settings that are specific to a particular configured interface type, and other-config is for options independent of the interface type. Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2] ip_gre: Remove even more unused code
On Mon, Nov 04, 2019 at 12:43:47PM -0800, Greg Rose wrote: > There is a confusing mix of ipgre and gretap functions with some > needed for gretap still having ipgre_ prefixes. This time though > I think I got the rest of the unused ipgre code. > > Passes Travis here and this time I made sure the patch passing > Travis is the same one I'm mailing. > https://travis-ci.org/gvrose8192/ovs-experimental/builds/607296133 > > Fixes: d5822f428814 ("gre: Remove dead ipgre code") > Signed-off-by: Greg Rose > > --- > V2 - Send the right patch this time Great. Thanks, I applied this to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] compat: Add compat fix for old kernels
On Mon, Nov 04, 2019 at 10:11:05AM +0200, Roi Dayan wrote: > In kernels older than 4.8, struct tcf_t didn't have the firstuse. > If openvswitch is compiled with the compat pkt_cls.h then there is > a struct size mismatch between openvswitch and the kernel which cause > parsing netlink actions to fail. > After this commit parsing the netlink actions pass even if compiled with > the compat pkt_cls.h. > > Signed-off-by: Roi Dayan > --- > acinclude.m4| 8 > include/linux/pkt_cls.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/acinclude.m4 b/acinclude.m4 > index a0507cfe019e..12513b82747d 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -186,6 +186,14 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [ > [Define to 1 if TCA_FLOWER_KEY_ENC_IP_TTL_MASK is > available.])]) > >AC_COMPILE_IFELSE([ > +AC_LANG_PROGRAM([#include ], [ > +struct tcf_t x; > + x.firstuse = 1; > +])], > +[AC_DEFINE([HAVE_TCF_T_FIRSTUSE], [1], > + [Define to 1 if struct tcf_t have firstuse.])]) > + I think you can use AC_CHECK_MEMBERS: AC_CHECK_MEMBERS([struct tcf_t.firstuse], [], [], [#include ]) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] vswitch.xml: Fix column for xdpmode.
'xdpmode' is part of 'options', not the 'other_config'. CC: William Tu Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.") Signed-off-by: Ilya Maximets --- vswitchd/vswitch.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 00c6bd2d4..efdfb83bb 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -3107,7 +3107,7 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ - -- 2.17.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC v2] bridge: Allow manual notifications about interfaces' updates.
On 05.11.2019 20:20, Ben Pfaff wrote: On Mon, Nov 04, 2019 at 02:22:34PM +0100, Ilya Maximets wrote: Sometimes interface updates could happen in a way ifnotifier is not able to catch. For example some heavy operations (device reset) in netdev-dpdk could require re-applying of the bridge configuration. For this purpose new manual notifier introduced. Its function 'if_notifier_manual_report()' could be called directly by the code that aware about changes. This new notifier is thread safe. Signed-off-by: Ilya Maximets --- Sending this as RFC that might be useful in context of the following patch: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364155.html It will allow us to not introduce heavy dpdk notifier just to update one sequence number on RTE_ETH_EVENT_INTR_RESET events. We could also use it to report other changes from DPDK, e.g. LSC interrupts. I *think* I understand what's going on here. It's that the proposed dpdk-notifier (https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364154.html) takes more locks, etc., so it's going to be slow. This one doesn't so it's cheaper. This one, however, will only work if the OVS code that makes the device changes calls into it, whereas the other one gets notified by DPDK itself. Both implementations doesn't receive notifications from DPDK itself. dpdk-notifier is just a more complex variant of the same thing with dynamic allocations and list of notifiers. In practice, the only way to trigger it is to call dpdk_notifierr_report_link() from the OVS code. The part that really receives DPDK events is 'dpdk_eth_event_callback()' from https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364155.html There is a possibility to make everything super right (not sure). This should look like this: * Allow more than one type of notifiers. i.e. introduce ifnotifier-provider.h, make a common interface in ifnotifier.{c,h}. Move code of the existing notifiers to ifnotifier-{linux,bsd}.{c.h} and make them both implement ifnotifier with type='system'. * Introduce dpdk_notifier (analogue of rtnetlink_notifier) that will subscribe itself to DPDK events with rte_eth_dev_callback_register() and receive actual DPDK events via callback. * Introduce ifnotifier-dpdk.{c,h} that will register itself as ifnotifier with type='dpdk'. Subscribe ifnotifier-dpdk to dpdk_notifier with dpdk_notifier_create(). * bridge.c should subscribe itself to all registered types of ifnotifiers to increase 'ifaces_changed' sequence number. * netdev-dpdk.c should subscribe to dpdk_notifier to receive DPDK events and perform required actions on devices. * Something else to resolve issues with DPDK intialization. (we can't register dpdk callbacks before dpdk initialization that could happen way after bridge initialization) I'm not sure if it's really possible to properly implement, but it looks like an over-engineering anyway. BTW, I'm not sure if any of our solutions are good enough. I think that this one could avoid introducing a mutex by using an atomic pointer, but I don't know whether that's worthwhile. The case is that we need to be sure that 'ifaces_changed' sequence number will not be destroyed while (before, actually) we're using it. Main thread Notification thread create seq set cb <-- func notification appeared func = read cb func() set cb <-- NULL destroy seq seq_change() // change of destroyed seq will lead to crash. So, the mutex ensures that callback is still registered while we're executing it. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ] travis: support ppc64le builds
Hi Wei If I change my matrix:include to use "arch: ppc64le" rather than "os: linux-ppc64le", will it eliminate your concern? matrix: include: - - os: linux-ppc64le + - arch: ppc64le compiler: gcc env: OPTS="--disable-ssl" Later when we want to enable the full matrix on all arch we would add: 1) arch: - amd64 - ppc64le - arm64 2) eliminate the include: -arch: ppc64le 3) Add any exclude: - arch:XXX sections for any tests that dont apply. For example I would add: exclude: - arch: ppc64le env: M32=1 OPTS="--disable-ssl" Regards, David On 2019-11-02 02:09, Yanqin Wei (Arm Technology China) wrote: Hi David, Thanks for your reply. Yes, my concern is how to define arch and os in .travis.yml after we cover all builds and cases for arm and ppc. This pattern can enable all builds and testsuits for x86 and arm arch: - amd64 - arm64 os: - linux This can enable all jobs for x86 and ppc. arch: - amd64 //default os: - linux - linux-ppc64le But it does not work to combine them. This means four kinds of arch+os combinations in all. Arm64+linux-ppc64le is invalid. arch: - amd64 - arm64 os: - linux - linux-ppc64le So if we finally cover all the builds and cases for arm/ppc, we have to duplicate all jobs for different cpu arch in the matrix include. matrix: include: - os: linux-ppc64le env: job1 - os: linux-ppc64le env: job2 ... - arch: arm64 env: job1 - arch: arm64 env: job2 ... But currently either arm or ppc has not cover all the cases, so they can coexist in build-matrix. And there is no conflict in build/prepare scripts, because both of them use TRAVIS_ARCH variable to indicate cpu arch. The patch series to enable arm CI is under internal review. It will be submitted when ready. Best Regards, Wei Yanqin -Original Message- From: dwilder Sent: Saturday, November 2, 2019 1:09 AM To: Yanqin Wei (Arm Technology China) Cc: Ilya Maximets ; ovs-dev@openvswitch.org; wil...@us.ibm.com; nd Subject: Re: RE: [ovs-dev] [PATCH ] travis: support ppc64le builds On 2019-10-30 19:04, Yanqin Wei (Arm Technology China) wrote: Hi, We are working to support arm64 build for ovs travis CI. It is indeed to use arch: arm64 to choose cpu architecture, because travis has provided native arm64 option now. But in this patch it seems ppc64 builds run on the ppc-VM + x86 native machine. Currently arm only select a part of jobs to run, which is defined in matrix:include. But the final object is to run all jobs. It means that arch: arm64 will be moved out of marix. If ppc plans to do the same in the future, it will conflict with arm jobs. Best Regards, Wei Yanqin Hi, I have added a build only test for ppc64le following the model used for osx. I think this is a good start for getting multi-arch support into Ci. I agree that running all jobs on the matrix on every arch is good goal. I dont completely understand your issue, is your concern the use of os: vs arch: ? I am glad to work with you to find a solution. Can you share your arm64 changes? We can discuss off-list if you prefer. -Original Message- From: ovs-dev-boun...@openvswitch.org On Behalf Of dwilder Sent: Wednesday, October 30, 2019 1:55 AM To: Ilya Maximets Cc: ovs-dev@openvswitch.org; wil...@us.ibm.com Subject: Re: [ovs-dev] [PATCH ] travis: support ppc64le builds On 2019-10-29 09:52, Ilya Maximets wrote: On 28.10.2019 22:22, David Wilder wrote: Add support for travis-ci ppc64le builds. - Updated matrix in .travis.yml to include a ppc64le build. - Added support to install packages needed by specific architectures. To keep the total build time at an acceptable level only a single build job is included in the matrix for ppc64le. A build report example can be found here [1] [0] https://urldefense.proofpoint.com/v2/url?u=http-3A__travis-2Dci.org_; d=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=7ndxyKjH9UrBD68Us2WP1wI4BwEBQbeAy z8i_vwCCaI=6JANehIfGoxUMtwHhe4yob4UPeby0Y8ovgzRDIyJZFo=UMYL8rzJNp h87seC0oJLBiWoe-sUSL80AJy0RMTgBzQ= [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_ djlwilder_ovs_builds_604098141=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=7n dxyKjH9UrBD68Us2WP1wI4BwEBQbeAyz8i_vwCCaI=6JANehIfGoxUMtwHhe4yob4UP eby0Y8ovgzRDIyJZFo=pyd2yQpQ0snpwGE5El4RYZsatwl74sthM1KLqtIKCnY= Signed-off-by: David Wilder Hi David, Thanks for working on this. I have a couple of question regarding ppc64le support by TravisCI. It seems that they are not supporting this architecture officially and refusing[1] to solve any issues that appears while using it. There also no official documentation. It's kind of a hidden feature that some projects are using for their own risk [2]. Do you know why this happens or maybe you have some insights about what is going on/how it works? Work is going on to increase ppc64le support on Travis by the end of the year. I dont have any details yet. My plan is to keep this to build-only ci until then. Important, ppc64le
Re: [ovs-dev] [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr
Hi Flavio, As per your inputs, I modified the gso_size, and now skb_gso_validate_mtu(skb, mtu) is returning true, and ip_finish_output2(sk, skb) and dst_neigh_output(dst, neigh, skb); are getting called. But still, I am seeing the large packets getting dropped somewhere in the kernel down the line and retransmission happening. if (skb_gso_validate_mtu(skb, mtu)) return ip_finish_output2(sk, skb); [ 1854.905733] vxlan_xmit:2262 skb->len:2776 packet_length:2762 [ 1854.905744] skb_gso_size_check:4478 and seg_len:1500 and max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535 The gso_size 1398 bytes is correct in my case ( 1398 + 50 (vxlan header) + 20(IP) + TCP(32) + 14(ETH) = 1514 bytes) The code is simple: vnet = buf; // buf is an array of 64k bytes len = 0; if (csum) { vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM); vnet->csum_start = (ETH_HLEN + sizeof(*iph)); vnet->csum_offset = (__builtin_offsetof(struct tcphdr, check)); } if (gso) { vnet->hdr_len = (ETH_HLEN + sizeof(*iph) + sizeof(*tcph)); vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; vnet->gso_size = ( ETH_DATA_LEN - 50 - sizeof(struct iphdr) - sizeof(struct tcphdr)); // 50 is the vxlan header } else { vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE; vnet->gso_size = 0; } len =sizeof(*vnet); // Now copying the entire L2 packet into the buf starting at an offset buf + len and sending the packet. Did I miss something? And I am not sure how OVS behaves after receiving this packet and before transmitting to vxlan. How is checksum offloading happening with af_packet in OVS? Does OVS have any role in this? Please see the attached image for reference. The packet flow with in the host is given below: Ubuntu container (eth0 (1500MTU))--routing lookup-->Ubuntu container(veth0(1450 MTU)) ->OVS(veth1(1450MTU))->vxlan(65K MTU)->eth0(physical interface(1500MTU))->other machine. Looking forward to your reply. Regards, Ramana On Mon, Nov 4, 2019 at 10:41 PM Ramana Reddy wrote: > Thanks, Flavio. I will check it out tomorrow and let you know how it goes. > > Regards, > Ramana > > > On Mon, Nov 4, 2019 at 10:15 PM Flavio Leitner wrote: > >> On Mon, 4 Nov 2019 21:32:28 +0530 >> Ramana Reddy wrote: >> >> > Hi Favio Leitner, >> > Thank you very much for your reply. Here is the code snippet. But the >> > same code is working if I send the packet without ovs. >> >> Could you provide more details on the OvS environment and the test? >> >> The linux kernel propagates the header size dependencies when you stack >> the devices in net_device->hard_header_len, so in the case of vxlan dev >> it will be: >> >> needed_headroom = lowerdev->hard_header_len; >> needed_headroom += VXLAN_HEADROOM; >> dev->needed_headroom = needed_headroom; >> >> Sounds like that is helping when OvS is not being used. >> >> fbl >> >> >> > bool csum = true; >> > bool gso = true' >> > struct virtio_net_hdr *vnet = buf; >> >if (csum) { >> > vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM); >> > vnet->csum_start = ETH_HLEN + sizeof(*iph); >> > vnet->csum_offset = __builtin_offsetof(struct >> > tcphdr, check); >> > } >> > >> > if (gso) { >> > vnet->hdr_len = ETH_HLEN + sizeof(*iph) + >> > sizeof(*tcph); >> > vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; >> > vnet->gso_size = ETH_DATA_LEN - sizeof(struct >> > iphdr) - >> > sizeof(struct >> > tcphdr); >> > } else { >> > vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE; >> > } >> > Regards, >> > Ramana >> > >> > >> > On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner >> > wrote: >> > >> > > >> > > Hi, >> > > >> > > What's the value you're passing on gso_size in struct >> > > virtio_net_hdr? You need to leave room for the encapsulation >> > > header, e.g.: >> > > >> > > gso_size = iface_mtu - virtio_net_hdr->hdr_len >> > > >> > > fbl >> > > >> > > On Mon, 4 Nov 2019 01:11:36 +0530 >> > > Ramana Reddy wrote: >> > > >> > > > Hi, >> > > > I am wondering if anyone can help me with this. I am having >> > > > trouble to send tso/gso packet >> > > > with af_packet socket with packet_vnet_hdr (through >> > > > virtio_net_hdr) over vxlan tunnel in OVS. >> > > > >> > > > What I observed that, the following function eventually hitting >> > > > and is returning false (net/core/skbuff.c), hence the packet is >> > > > dropping. static inline bool skb_gso_size_check(const
Re: [ovs-dev] [RFC v2] bridge: Allow manual notifications about interfaces' updates.
On Mon, Nov 04, 2019 at 02:22:34PM +0100, Ilya Maximets wrote: > Sometimes interface updates could happen in a way ifnotifier is not > able to catch. For example some heavy operations (device reset) in > netdev-dpdk could require re-applying of the bridge configuration. > > For this purpose new manual notifier introduced. Its function > 'if_notifier_manual_report()' could be called directly by the code > that aware about changes. This new notifier is thread safe. > > Signed-off-by: Ilya Maximets > --- > > Sending this as RFC that might be useful in context of the following patch: > https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364155.html > > It will allow us to not introduce heavy dpdk notifier just to update > one sequence number on RTE_ETH_EVENT_INTR_RESET events. We could also > use it to report other changes from DPDK, e.g. LSC interrupts. I *think* I understand what's going on here. It's that the proposed dpdk-notifier (https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364154.html) takes more locks, etc., so it's going to be slow. This one doesn't so it's cheaper. This one, however, will only work if the OVS code that makes the device changes calls into it, whereas the other one gets notified by DPDK itself. I think that this one could avoid introducing a mutex by using an atomic pointer, but I don't know whether that's worthwhile. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] jsonrpc: make jsonrpc input_buffer size parametric
On Tue, Nov 05, 2019 at 07:42:51PM +0200, Lorenzo Bianconi wrote: > > > > On Tue, Nov 05, 2019 at 04:27:51PM +0200, Lorenzo Bianconi wrote: > > > Allow jsonrpc clients (e.g. ovn-controller) to specify jsonrpc input > > > buffer size in order to reduce overhead when downloading huge db size > > > since current value is 512B. The user can specify rpc buffer size using > > > ovsdb_idl_set_remote routine passing requested value > > > > > > Signed-off-by: Lorenzo Bianconi > > > > Hmm, I thought that we had decided to just try a 4096-byte buffer to > > start. That would be a much smaller patch. The marginal benefits of > > buffers larger than that probably decline a lot since the processing > > cost of 4096 bytes of JSON is probably a lot more than the overhead of a > > system call. I imagine that any further benefit is probably from being > > able to process more JSON per trip through the main loop. Those same > > benefits could also be obtained by increasing the 'for' loop limit in > > jsonrpc_recv() from 50 to some larger number. > > Hi Ben, > > thanks for the review. I tried to get a PoC (this is why I sent it as > RFC) but since the IDL/reconnect code is pretty spread > I ended up with a large patch. I agree we can just set the bufsize to 4096. > If it is ok for everybody I will post a patch to set the buffer size > to one page. I suspect that will be fine. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] system-ovn.at: Create IPv6 load balancing tests
On Tue, Nov 05, 2019 at 01:06:13PM -0500, Russell Bryant wrote: > On Tue, Nov 5, 2019 at 12:38 PM Ben Pfaff wrote: > > > On Tue, Nov 05, 2019 at 12:23:09PM -0500, Russell Bryant wrote: > > > Duplicate all of the IPv4 load balancing test cases for IPv6. > > > All of these are passing without any changes needed in OVN code, but > > > this will help ensure that we do not have any IPv6 load balancing > > > regressions in the future. > > > > > > Signed-off-by: Russell Bryant > > > > > +# > > > +# A note on square brackets and IPv6 ... > > > +# > > > +# To get square brackets to not get interpreted by m4, this file is > > using: > > > +# > > > +# For [ --> @<:@ > > > +# For ] --> @:>@ > > > +# > > > +# > > https://stackoverflow.com/questions/2308721/how-do-i-escape-text-in-autoconf-m4 > > > > You can usually get the same effect by just doubling the brackets, > > i.e. [[::1]]. You can find some examples with "git grep -F '[[::'". In > > some cases you end up with three sets of brackets because of outer > > quoting, e.g.: > > > > CHECK_STREAM_OPEN_BLOCK([tcp6], [[[::1]]]) > > > > Thanks! I'll give that a shot. That would be a lot more readable than > what I did ... Yeah, the quadigraphs look terrible. They're slightly better than C trigraphs, but not a lot. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] system-ovn.at: Create IPv6 load balancing tests
Bleep bloop. Greetings Russell Bryant, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line is 184 characters long (recommended limit is 79) #757 FILE: p-set-addresses:1998: tcp,orig=(src=172.16.1.3,dst=30.0.0.1,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.1.3,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 184 characters long (recommended limit is 79) #758 FILE: p-set-addresses:1999: tcp,orig=(src=172.16.1.3,dst=30.0.0.1,sport=,dport=),reply=(src=192.168.2.2,dst=172.16.1.3,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 185 characters long (recommended limit is 79) #774 FILE: p-set-addresses:2006: tcp,orig=(src=172.16.1.3,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=20.0.0.2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 185 characters long (recommended limit is 79) #775 FILE: p-set-addresses:2007: tcp,orig=(src=172.16.1.3,dst=192.168.2.2,sport=,dport=),reply=(src=192.168.2.2,dst=20.0.0.2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 105 characters long (recommended limit is 79) #931 FILE: p-set-addresses:2167: NS_CHECK_EXEC([alice1], [wget http://@<:@fd30::1@:>@ -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) WARNING: Line is 173 characters long (recommended limit is 79) #940 FILE: p-set-addresses:2173: tcp,orig=(src=fd72::3,dst=fd30::1,sport=,dport=),reply=(src=fd11::2,dst=fd72::3,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 173 characters long (recommended limit is 79) #941 FILE: p-set-addresses:2174: tcp,orig=(src=fd72::3,dst=fd30::1,sport=,dport=),reply=(src=fd12::2,dst=fd72::3,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 173 characters long (recommended limit is 79) #950 FILE: p-set-addresses:2180: tcp,orig=(src=fd72::3,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd20::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 173 characters long (recommended limit is 79) #951 FILE: p-set-addresses:2181: tcp,orig=(src=fd72::3,dst=fd12::2,sport=,dport=),reply=(src=fd12::2,dst=fd20::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 89 characters long (recommended limit is 79) #971 FILE: p-set-addresses:2348: -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ WARNING: Line is 106 characters long (recommended limit is 79) #1047 FILE: p-set-addresses:2424: ovn-nbctl set load_balancer $uuid vips:'"@<:@fd72::11@:>@:8000"'='"@<:@fd01::2@:>@:80,@<:@fd02::2@:>@:80"' WARNING: Line is 106 characters long (recommended limit is 79) #1061 FILE: p-set-addresses:2438: NS_CHECK_EXEC([alice1], [wget http://@<:@fd72::10@:>@ -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) WARNING: Line is 80 characters long (recommended limit is 79) #1065 FILE: p-set-addresses:2442: AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd72::10) | grep -v fe80 | WARNING: Line is 174 characters long (recommended limit is 79) #1067 FILE: p-set-addresses:2444: tcp,orig=(src=fd72::2,dst=fd72::10,sport=,dport=),reply=(src=fd01::2,dst=fd72::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 174 characters long (recommended limit is 79) #1068 FILE: p-set-addresses:2445: tcp,orig=(src=fd72::2,dst=fd72::10,sport=,dport=),reply=(src=fd02::2,dst=fd72::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 111 characters long (recommended limit is 79) #1074 FILE: p-set-addresses:2451: NS_CHECK_EXEC([alice1], [wget http://@<:@fd72::11@:>@:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) WARNING: Line is 80 characters long (recommended limit is 79) #1078 FILE: p-set-addresses:2455: AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd72::11) | grep -v fe80 | WARNING: Line is 174 characters long (recommended limit is 79) #1080 FILE: p-set-addresses:2457: tcp,orig=(src=fd72::2,dst=fd72::11,sport=,dport=),reply=(src=fd01::2,dst=fd72::2,sport=,dport=),zone=,protoinfo=(state=) WARNING: Line is 174 characters long (recommended limit is 79) #1081 FILE: p-set-addresses:2458: tcp,orig=(src=fd72::2,dst=fd72::11,sport=,dport=),reply=(src=fd02::2,dst=fd72::2,sport=,dport=),zone=,protoinfo=(state=) Lines checked: 1105, Warnings: 19, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] Fix ha chassis failover issues for stale ha chassis entries
From: Numan Siddique If ha chassis rows of an HA chassis group become stale i.e the HA_Chassis.chassis column is empty (because ovn-controller is not running in that chassis) except one row and when ha_chassis_group_is_active() is called on that ovn-controller, then it returns false. Ideally it should become active since its the only active chassis. This patch fixes this issue. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1762777 Reported-by: Daniel Alvarez Signed-off-by: Numan Siddique --- controller/ha-chassis.c | 22 ++ tests/ovn.at| 20 +++- 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/controller/ha-chassis.c b/controller/ha-chassis.c index 6d9426a5c..46d5b50fc 100644 --- a/controller/ha-chassis.c +++ b/controller/ha-chassis.c @@ -142,6 +142,20 @@ ha_chassis_destroy_ordered(struct ha_chassis_ordered *ordered_ha_ch) } } +/* Returns the number of ha_chassis whose chassis column is + * set. */ +static size_t +get_active_ha_chassis(const struct sbrec_ha_chassis_group *ha_ch_grp) +{ +size_t n_active_ha_chassis = 0; +for (size_t i = 0; i < ha_ch_grp->n_ha_chassis; i++) { +if (ha_ch_grp->ha_chassis[i]->chassis) { +n_active_ha_chassis++; +} +} + +return n_active_ha_chassis; +} /* Returns true if the local_chassis is the master of * the HA chassis group, false otherwise. */ @@ -159,6 +173,14 @@ ha_chassis_group_is_active( return (ha_ch_grp->ha_chassis[0]->chassis == local_chassis); } +if (get_active_ha_chassis(ha_ch_grp) == 1) { +/* Only 1 ha_chassis of this chassis group has the 'chasss' + * column set, which can only be true for this 'local_chassis'. + * So return true - as the local_chassis can be the master of + * this HA chassis group. */ +return true; +} + if (sset_is_empty(active_tunnels)) { /* If active tunnel sset is empty, it means it has lost * connectivity with other chassis. */ diff --git a/tests/ovn.at b/tests/ovn.at index 410f4b514..cb7903db8 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -13413,7 +13413,25 @@ OVS_WAIT_UNTIL( logical_port=ls1-lp_ext1` test "$chassis" = "$hv1_uuid"]) -OVN_CLEANUP([hv1],[hv2],[hv3]) +# Stop ovn-controllers on hv1 and hv3. +as hv1 ovn-appctl -t ovn-controller exit +as hv3 ovn-appctl -t ovn-controller exit + +# hv2 should be master and claim ls1-lp_ext1 +OVS_WAIT_UNTIL( +[chassis=`ovn-sbctl --bare --columns chassis find port_binding \ +logical_port=ls1-lp_ext1` +test "$chassis" = "$hv2_uuid"]) + +as hv1 +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as hv3 +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +OVN_CLEANUP([hv2]) AT_CLEANUP AT_SETUP([ovn -- Address Set Incremental Processing]) -- 2.23.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v14] Improved Packet Drop Statistics in OVS
Comments inline. Best regards, Ilya Maximets. On 05.11.2019 5:35, Anju Thomas via dev wrote: Currently OVS maintains explicit packet drop/error counters only on port level. Packets that are dropped as part of normal OpenFlow processing are counted in flow stats of “drop” flows or as table misses in table stats. These can only be interpreted by controllers that know the semantics of the configured OpenFlow pipeline. Without that knowledge, it is impossible for an OVS user to obtain e.g. the total number of packets dropped due to OpenFlow rules. Furthermore, there are numerous other reasons for which packets can be dropped by OVS slow path that are not related to the OpenFlow pipeline. The generated datapath flow entries include a drop action to avoid further expensive upcalls to the slow path, but subsequent packets dropped by the datapath are not accounted anywhere. Finally, the datapath itself drops packets in certain error situations. Also, these drops are today not accounted for.This makes it difficult for OVS users to monitor packet drop in an OVS instance and to alert a management system in case of a unexpected increase of such drops. Also OVS trouble-shooters face difficulties in analysing packet drops. With this patch we implement following changes to address the issues mentioned above. 1. Identify and account all the silent packet drop scenarios 2. Display these drops in ovs-appctl coverage/show Co-authored-by: Rohith Basavaraja Co-authored-by: Keshav Gupta Signed-off-by: Anju Thomas Signed-off-by: Rohith Basavaraja Signed-off-by: Keshav Gupta Acked-by: Eelco Chaudron Acked-by: Eelco Chaudron --- v14 (Eelco's comments) 1. Remove definition of dpif_show_drop_stats_support 2. Remove extra check for drop reason in odp_execute_actions v13(Eelco and Illya's comments) 1. packet_dropped changed to packets_dropped 2. Uses dp_packet_batch_size instead of packet->size 3. Add version history v12(Ben's comments) 1. new dp action in kernel block in openvswitch.h 2. change xlate_error enum to be used as u32 3. resetting xlate error in case of congestion drop and forwarding disabled datapath/linux/compat/include/linux/openvswitch.h | 1 + lib/dpif-netdev.c | 47 - lib/dpif.c| 7 + lib/dpif.h| 2 + lib/odp-execute.c | 78 lib/odp-util.c| 9 + ofproto/ofproto-dpif-ipfix.c | 1 + ofproto/ofproto-dpif-sflow.c | 1 + ofproto/ofproto-dpif-xlate.c | 40 - ofproto/ofproto-dpif-xlate.h | 3 + ofproto/ofproto-dpif.c| 8 + ofproto/ofproto-dpif.h| 8 +- tests/automake.mk | 3 +- tests/dpif-netdev.at | 8 + tests/drop-stats.at | 210 ++ tests/ofproto-dpif.at | 2 +- tests/tunnel-push-pop.at | 29 ++- tests/tunnel.at | 16 +- 18 files changed, 462 insertions(+), 11 deletions(-) create mode 100644 tests/drop-stats.at diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h index 7b16b1d..798b5aa 100644 --- a/datapath/linux/compat/include/linux/openvswitch.h +++ b/datapath/linux/compat/include/linux/openvswitch.h @@ -993,6 +993,7 @@ enum ovs_action_attr { #ifndef __KERNEL__ OVS_ACTION_ATTR_TUNNEL_PUSH, /* struct ovs_action_push_tnl*/ OVS_ACTION_ATTR_TUNNEL_POP,/* u32 port number. */ + OVS_ACTION_ATTR_DROP, Description of an argument required here. And the fuller action description above in the comment to this enum. #endif __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted * from userspace. */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 4720ba1..8835158 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -102,6 +102,17 @@ enum { MAX_METERS = 65536 };/* Maximum number of meters. */ enum { MAX_BANDS = 8 }; /* Maximum number of bands / meter. */ enum { N_METER_LOCKS = 64 };/* Maximum number of meters. */ +COVERAGE_DEFINE(datapath_drop_meter); +COVERAGE_DEFINE(datapath_drop_upcall_error); +COVERAGE_DEFINE(datapath_drop_lock_error); +COVERAGE_DEFINE(datapath_drop_userspace_action_error); +COVERAGE_DEFINE(datapath_drop_tunnel_push_error); +COVERAGE_DEFINE(datapath_drop_tunnel_pop_error); +COVERAGE_DEFINE(datapath_drop_recirc_error); +COVERAGE_DEFINE(datapath_drop_invalid_port); +COVERAGE_DEFINE(datapath_drop_invalid_tnl_port); +COVERAGE_DEFINE(datapath_drop_rx_invalid_packet); + /* Protects against changes to
Re: [ovs-dev] [PATCH ovn] system-ovn.at: Create IPv6 load balancing tests
On Tue, Nov 5, 2019 at 12:38 PM Ben Pfaff wrote: > On Tue, Nov 05, 2019 at 12:23:09PM -0500, Russell Bryant wrote: > > Duplicate all of the IPv4 load balancing test cases for IPv6. > > All of these are passing without any changes needed in OVN code, but > > this will help ensure that we do not have any IPv6 load balancing > > regressions in the future. > > > > Signed-off-by: Russell Bryant > > > +# > > +# A note on square brackets and IPv6 ... > > +# > > +# To get square brackets to not get interpreted by m4, this file is > using: > > +# > > +# For [ --> @<:@ > > +# For ] --> @:>@ > > +# > > +# > https://stackoverflow.com/questions/2308721/how-do-i-escape-text-in-autoconf-m4 > > You can usually get the same effect by just doubling the brackets, > i.e. [[::1]]. You can find some examples with "git grep -F '[[::'". In > some cases you end up with three sets of brackets because of outer > quoting, e.g.: > > CHECK_STREAM_OPEN_BLOCK([tcp6], [[[::1]]]) > Thanks! I'll give that a shot. That would be a lot more readable than what I did ... -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] jsonrpc: make jsonrpc input_buffer size parametric
> > On Tue, Nov 05, 2019 at 04:27:51PM +0200, Lorenzo Bianconi wrote: > > Allow jsonrpc clients (e.g. ovn-controller) to specify jsonrpc input > > buffer size in order to reduce overhead when downloading huge db size > > since current value is 512B. The user can specify rpc buffer size using > > ovsdb_idl_set_remote routine passing requested value > > > > Signed-off-by: Lorenzo Bianconi > > Hmm, I thought that we had decided to just try a 4096-byte buffer to > start. That would be a much smaller patch. The marginal benefits of > buffers larger than that probably decline a lot since the processing > cost of 4096 bytes of JSON is probably a lot more than the overhead of a > system call. I imagine that any further benefit is probably from being > able to process more JSON per trip through the main loop. Those same > benefits could also be obtained by increasing the 'for' loop limit in > jsonrpc_recv() from 50 to some larger number. Hi Ben, thanks for the review. I tried to get a PoC (this is why I sent it as RFC) but since the IDL/reconnect code is pretty spread I ended up with a large patch. I agree we can just set the bufsize to 4096. If it is ok for everybody I will post a patch to set the buffer size to one page. Regards, Lorenzo ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] system-ovn.at: Create IPv6 load balancing tests
On Tue, Nov 05, 2019 at 12:23:09PM -0500, Russell Bryant wrote: > Duplicate all of the IPv4 load balancing test cases for IPv6. > All of these are passing without any changes needed in OVN code, but > this will help ensure that we do not have any IPv6 load balancing > regressions in the future. > > Signed-off-by: Russell Bryant > +# > +# A note on square brackets and IPv6 ... > +# > +# To get square brackets to not get interpreted by m4, this file is using: > +# > +# For [ --> @<:@ > +# For ] --> @:>@ > +# > +# > https://stackoverflow.com/questions/2308721/how-do-i-escape-text-in-autoconf-m4 You can usually get the same effect by just doubling the brackets, i.e. [[::1]]. You can find some examples with "git grep -F '[[::'". In some cases you end up with three sets of brackets because of outer quoting, e.g.: CHECK_STREAM_OPEN_BLOCK([tcp6], [[[::1]]]) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] jsonrpc: make jsonrpc input_buffer size parametric
On Tue, Nov 05, 2019 at 04:27:51PM +0200, Lorenzo Bianconi wrote: > Allow jsonrpc clients (e.g. ovn-controller) to specify jsonrpc input > buffer size in order to reduce overhead when downloading huge db size > since current value is 512B. The user can specify rpc buffer size using > ovsdb_idl_set_remote routine passing requested value > > Signed-off-by: Lorenzo Bianconi Hmm, I thought that we had decided to just try a 4096-byte buffer to start. That would be a much smaller patch. The marginal benefits of buffers larger than that probably decline a lot since the processing cost of 4096 bytes of JSON is probably a lot more than the overhead of a system call. I imagine that any further benefit is probably from being able to process more JSON per trip through the main loop. Those same benefits could also be obtained by increasing the 'for' loop limit in jsonrpc_recv() from 50 to some larger number. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] connection tracking questions
On Tue, Nov 05, 2019 at 12:04:39PM -0500, Nicolas Bouliane via dev wrote: > > Can you try: > > > > # ovs-appctl dpctl/dump-conntrack system@ovs-system zone=5 > > > > and see if you see the connection you've inserted is visible? > > > > It doesn't work either... but it made me realize that in my Docker > environment ovs-vswitchd runs with --enable-dummy=system ! (so it must be > it) Ah-hah, yes, good catch. > And I confirmed that it worked on a real Hypervisor and it worked. So glad to hear that! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] system-ovn.at: Create IPv6 load balancing tests
Duplicate all of the IPv4 load balancing test cases for IPv6. All of these are passing without any changes needed in OVN code, but this will help ensure that we do not have any IPv6 load balancing regressions in the future. Signed-off-by: Russell Bryant --- tests/system-ovn.at | 887 1 file changed, 810 insertions(+), 77 deletions(-) diff --git a/tests/system-ovn.at b/tests/system-ovn.at index b3f90aae2..2c37f759c 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -1158,6 +1158,164 @@ tcp,orig=(src=192.168.1.2,dst=30.0.0.2,sport=,dport=),reply=(s ]) +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d"]) +AT_CLEANUP + +# +# A note on square brackets and IPv6 ... +# +# To get square brackets to not get interpreted by m4, this file is using: +# +# For [ --> @<:@ +# For ] --> @:>@ +# +# https://stackoverflow.com/questions/2308721/how-do-i-escape-text-in-autoconf-m4 +# + +AT_SETUP([ovn -- load-balancing - IPv6]) +AT_KEYWORDS([ovnlb]) + +CHECK_CONNTRACK() +CHECK_CONNTRACK_NAT() +ovn_start +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-int]) + +# Set external-ids in br-int needed for ovn-controller +ovs-vsctl \ +-- set Open_vSwitch . external-ids:system-id=hv1 \ +-- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ +-- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ +-- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ +-- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller + +# Logical network: +# 2 logical switches "foo" (fd01::/64) and "bar" (fd02::/64) +# connected to a router R1. +# foo has foo1 to act as a client. +# bar has bar1, bar2, bar3 to act as servers. +# +# Loadbalancer VIPs in fd03::/64 network. + +ovn-nbctl create Logical_Router name=R1 +ovn-nbctl ls-add foo +ovn-nbctl ls-add bar + +# Connect foo to R1 +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 fd01::1/64 +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ +type=router options:router-port=foo addresses=\"00:00:01:01:02:03\" + +# Connect bar to R1 +ovn-nbctl lrp-add R1 bar 00:00:01:01:02:04 fd02::1/64 +ovn-nbctl lsp-add bar rp-bar -- set Logical_Switch_Port rp-bar \ +type=router options:router-port=bar addresses=\"00:00:01:01:02:04\" + +# Create logical port 'foo1' in switch 'foo'. +ADD_NAMESPACES(foo1) +ADD_VETH(foo1, foo1, br-int, "fd01::2/64", "f0:00:00:01:02:03", \ + "fd01::1") +ovn-nbctl lsp-add foo foo1 \ +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 fd01::2" + +# Create logical ports 'bar1', 'bar2', 'bar3' in switch 'bar'. +ADD_NAMESPACES(bar1) +ADD_VETH(bar1, bar1, br-int, "fd02::2/64", "f0:00:0f:01:02:03", \ + "fd02::1") +ovn-nbctl lsp-add bar bar1 \ +-- lsp-set-addresses bar1 "f0:00:0f:01:02:03 fd02::2" + +ADD_NAMESPACES(bar2) +ADD_VETH(bar2, bar2, br-int, "fd02::3/64", "f0:00:0f:01:02:04", \ + "fd02::1") +ovn-nbctl lsp-add bar bar2 \ +-- lsp-set-addresses bar2 "f0:00:0f:01:02:04 fd02::3" + +ADD_NAMESPACES(bar3) +ADD_VETH(bar3, bar3, br-int, "fd02::4/64", "f0:00:0f:01:02:05", \ + "fd02::1") +ovn-nbctl lsp-add bar bar3 \ +-- lsp-set-addresses bar3 "f0:00:0f:01:02:05 fd02::4" + +# Config OVN load-balancer with a VIP. +uuid=`ovn-nbctl create load_balancer vips:\"fd03::1\"=\"fd02::2,fd02::3,fd02::4\"` +ovn-nbctl set logical_switch foo load_balancer=$uuid + +# Create another load-balancer with another VIP. +uuid=`ovn-nbctl create load_balancer vips:\"fd03::3\"=\"fd02::2,fd02::3,fd02::4\"` +ovn-nbctl add logical_switch foo load_balancer $uuid + +# Config OVN load-balancer with another VIP (this time with ports). +ovn-nbctl set load_balancer $uuid vips:'"@<:@fd03::2@:>@:8000"'='"@<:@fd02::2@:>@:80,@<:@fd02::3@:>@:80,@<:@fd02::4@:>@:80"' + +# Wait for ovn-controller to catch up. +ovn-nbctl --wait=hv sync +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \ +grep 'nat(dst=\@<:@fd02::4\@:>@:80)']) + +# Start webservers in 'bar1', 'bar2' and 'bar3'. +OVS_START_L7([bar1], [http6]) +OVS_START_L7([bar2], [http6]) +OVS_START_L7([bar3], [http6]) + +dnl Should work with the virtual IP fd03::1 address through NAT +for i in `seq 1 20`; do +echo Request $i +NS_CHECK_EXEC([foo1], [wget http://@<:@fd03::1@:>@ -t 5 -T 1 --retry-connrefused -v -o wget$i.log || (ovs-ofctl -O OpenFlow13 dump-flows br-int && false)]) +done + +dnl Each server should have at least one connection. +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd03::1) | grep -v fe80 | \ +sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl +tcp,orig=(src=fd01::2,dst=fd03::1,sport=,dport=),reply=(src=fd02::2,dst=fd01::2,sport=,dport=),zone=,protoinfo=(state=)
[ovs-dev] [PATCH] bridge: Allow manual notifications about interfaces' updates.
Sometimes interface updates could happen in a way ifnotifier is not able to catch. For example some heavy operations (device reset) in netdev-dpdk could require re-applying of the bridge configuration. For this purpose new manual notifier introduced. Its function 'if_notifier_manual_report()' could be called directly by the code that aware about changes. This new notifier is thread-safe. Signed-off-by: Ilya Maximets --- This functionality might be useful in context of the following patch: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364155.html It will allow us to not introduce heavy dpdk notifier just to update one sequence number on RTE_ETH_EVENT_INTR_RESET events. We could also use it to report other changes from DPDK, e.g. LSC interrupts. Version 1: * Sending as an official patch. * No changes since RFC v2. RFC v2: * Main functions moved from bridge.c to the new lib/if-notifier-manual.c to allow using from other lib code. lib/automake.mk | 3 ++- lib/if-notifier-manual.c | 55 lib/if-notifier.h| 7 + vswitchd/bridge.c| 2 ++ 4 files changed, 66 insertions(+), 1 deletion(-) create mode 100644 lib/if-notifier-manual.c diff --git a/lib/automake.mk b/lib/automake.mk index 17b36b43d..ebf714501 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -111,6 +111,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/hmapx.h \ lib/id-pool.c \ lib/id-pool.h \ + lib/if-notifier-manual.c \ + lib/if-notifier.h \ lib/ipf.c \ lib/ipf.h \ lib/jhash.c \ @@ -394,7 +396,6 @@ lib_libopenvswitch_la_SOURCES += \ lib/dpif-netlink-rtnl.c \ lib/dpif-netlink-rtnl.h \ lib/if-notifier.c \ - lib/if-notifier.h \ lib/netdev-linux.c \ lib/netdev-linux.h \ lib/netdev-linux-private.h \ diff --git a/lib/if-notifier-manual.c b/lib/if-notifier-manual.c new file mode 100644 index 0..72d6d8b8d --- /dev/null +++ b/lib/if-notifier-manual.c @@ -0,0 +1,55 @@ +/* + * Copyright (c) 2019 Ilya Maximets . + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include "openvswitch/compiler.h" +#include "openvswitch/thread.h" +#include "openvswitch/util.h" +#include "if-notifier.h" + +/* Implementation of a manual interface notifier. + * + * Intended for catching interface events that could not be tracked by + * OS specific interface notifiers, e.g. iface updates in netdev-dpdk. + * For that purpose 'if_notifier_manual_report()' should be called directly + * by the code that aware of interface changes. + * + * Thread-safety + * = + * This notifier is thread-safe in terms of calling its functions from + * different thread contexts, however if the callback passed to + * 'if_notifier_manual_set_cb' is used by some other code (i.e. by OS + * specific notifiers) it must be thread-safe itself. + */ + +static struct ovs_mutex manual_notifier_mutex = OVS_MUTEX_INITIALIZER; +static if_notify_func *notify OVS_GUARDED_BY(manual_notifier_mutex) = NULL; + +void if_notifier_manual_set_cb(if_notify_func *cb) +{ +ovs_mutex_lock(_notifier_mutex); +notify = cb; +ovs_mutex_unlock(_notifier_mutex); +} + +void if_notifier_manual_report(void) +{ +ovs_mutex_lock(_notifier_mutex); +if (notify) { +notify(NULL); +} +ovs_mutex_unlock(_notifier_mutex); +} diff --git a/lib/if-notifier.h b/lib/if-notifier.h index 259138f70..dae852e9f 100644 --- a/lib/if-notifier.h +++ b/lib/if-notifier.h @@ -27,4 +27,11 @@ void if_notifier_destroy(struct if_notifier *); void if_notifier_run(void); void if_notifier_wait(void); +/* Below functions are reserved for if-notifier-manual , i.e. for manual + * notifications from the OVS code. + * Must not be implemented by OS specific notifiers. */ + +void if_notifier_manual_set_cb(if_notify_func *); +void if_notifier_manual_report(void); + #endif /* if-notifier.h */ diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index 9095ebf5d..37131712d 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -529,11 +529,13 @@ bridge_init(const char *remote) ifaces_changed = seq_create(); last_ifaces_changed = seq_read(ifaces_changed); ifnotifier = if_notifier_create(if_change_cb, NULL); +if_notifier_manual_set_cb(if_change_cb); } void bridge_exit(bool delete_datapath) { +if_notifier_manual_set_cb(NULL); if_notifier_destroy(ifnotifier);
Re: [ovs-dev] connection tracking questions
> > > > Can you try: > > # ovs-appctl dpctl/dump-conntrack system@ovs-system zone=5 > > and see if you see the connection you've inserted is visible? > It doesn't work either... but it made me realize that in my Docker environment ovs-vswitchd runs with --enable-dummy=system ! (so it must be it) And I confirmed that it worked on a real Hypervisor and it worked. Thanks ! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv3] netdev-afxdp: Enable loading XDP program.
Hi William, See some comments inline. //Eelco On 1 Nov 2019, at 21:14, William Tu wrote: Now netdev-afxdp always forwards all packets to userspace because it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'. There are some cases when users want to keep packets in kernel instead of sending to userspace, for example, management traffic such as SSH should be processed in kernel. The patch enables loading the user-provide XDP program by doing $ovs-vsctl -- set int afxdp-p0 options:xdp-obj= So users can implement their filtering logic or traffic steering idea in their XDP program, and rest of the traffic passes to AF_XDP socket handled by OVS. Signed-off-by: William Tu --- v3: Feedbacks from Eelco. - keep using xdpobj not xdp-obj (because we alread use xdpmode) or we change both to xdp-obj and xdp-mode? - log a info message when using external program for better debugging - combine some failure messages - update doc NEW: - add options:xdpobj=__default__, to set back to libbpf default prog - Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/606153231 v2: A couple fixes and remove RFC --- Documentation/intro/install/afxdp.rst | 49 +++ lib/netdev-afxdp.c| 148 +- lib/netdev-linux-private.h| 2 + 3 files changed, 181 insertions(+), 18 deletions(-) diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst index a136db0c950a..89bc97da9548 100644 --- a/Documentation/intro/install/afxdp.rst +++ b/Documentation/intro/install/afxdp.rst @@ -273,6 +273,55 @@ Or, use OVS pmd tool:: ovs-appctl dpif-netdev/pmd-stats-show +Loading Custom XDP Program +-- +By defailt, netdev-afxdp always forwards all packets to userspace because +it is using libbpf's default XDP program. There are some cases when users +want to keep packets in kernel instead of sending to userspace, for example, +management traffic such as SSH should be processed in kernel. This can be +done by loading the user-provided XDP program:: + + ovs-vsctl -- set int afxdp-p0 options:xdpobj= + +So users can implement their filtering logic or traffic steering idea +in their XDP program, and rest of the traffic passes to AF_XDP socket +handled by OVS. To set it back to default, use:: + + ovs-vsctl -- set int afxdp-p0 options:xdpobj=__default__ + +Below is a sample C program compiled under kernel's samples/bpf/. + +.. code-block:: c + + #include + #include "bpf_helpers.h" + You mentioned you would add the below but did not, any reason, or just forgot about it? #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0) /* Kernel version before 5.3 needed an additional map */ struct bpf_map_def SEC("maps") qidconf_map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(int), .value_size = sizeof(int), .max_entries = 64, }; #endif + /* OVS will associate map 'xsks_map' to xsk socket. */ + struct bpf_map_def SEC("maps") xsks_map = { + .type = BPF_MAP_TYPE_XSKMAP, + .key_size = sizeof(int), + .value_size = sizeof(int), + .max_entries = 32, + }; + + SEC("xdp_sock") + int xdp_sock_prog(struct xdp_md *ctx) + { + int index = ctx->rx_queue_index; + + /* Customized by user. + * For example + * 1) filter out all SSH traffic and return XDP_PASS + *for kernel to process. + * 2) Drop unwanted packet by returning XDP_DROP. + */ + + /* Rest of packets goes to AF_XDP. */ + return bpf_redirect_map(_map, index, 0); + } + char _license[] SEC("license") = "GPL"; + + Example Script -- diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index af654d498a88..be74527946a4 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -21,6 +21,7 @@ #include "netdev-afxdp.h" #include "netdev-afxdp-pool.h" +#include #include #include #include @@ -30,6 +31,7 @@ #include #include #include +#include #include #include @@ -88,8 +90,11 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS); #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base)) +#define LIBBPF_XDP_PROGRAM "__default__" + static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id, - int mode, bool use_need_wakeup); + int mode, bool use_need_wakeup, + bool use_default_xdp); static void dest(uint32_t ifindex, int xdpmode); static void xsk_destroy(struct xsk_socket_info *xsk); static int xsk_configure_all(struct netdev *netdev); @@ -213,6 +218,50 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED) ovs_mutex_unlock(_pools_mutex); } +static int +xsk_load_prog(const char *path, struct bpf_object **obj, + int *prog_fd) +{ +struct bpf_prog_load_attr attr = { +
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.
On 20.10.2019 14:31, David Marchand wrote: Hello, On Tue, Oct 15, 2019 at 2:13 PM Ilya Maximets wrote: On 14.10.2019 20:04, Aaron Conole wrote: David Marchand writes: Add a coverage counter to help diagnose contention on the vhost txqs. This is seen as dropped packets on the physical ports for rates that are usually handled fine by OVS. Document how to further debug this contention with perf. Signed-off-by: David Marchand --- Changelog since v1: - added documentation as a bonus: not sure this is the right place, or if it really makes sense to enter into such details. But I still find it useful. Comments? It's useful, and I think it makes sense here. That's an interesting debug method, but it looks not very suitable for an end-user documentation. One thing that bothers me the most is referencing C code snippets in this kind of documentation. Ok, can we conclude on the coverage counter wrt the v1 then? https://patchwork.ozlabs.org/patch/1153238/ Should I submit a v3 with the doc update (but removing the parts about perf) ? 'perf' part should not be there. I'm in doubt about the rest of the docs. This part might be useful, but it doesn't provide any solution for the problem and I really don't know if there is something we can suggest in that case. "Rework your network topology" doesn't sound like a friendly solution or exact steps to do. One more thing is that documenting coverage counters doesn't look like a good idea to me and I'd like to not create a precedent. One day we'll rework this to be some "PMD/netdev performance statistics" and it'll be OK to document it. But there is nothing more permanent than a temporary solution. Right now the easiest way for me is to just apply v1. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v11 1/3] netdev-dpdk: Reuse vhost function for dpdk ETH custom stats.
On 05.11.2019 13:54, Sriram Vatala wrote: Thanks Kevin for your response. @i.maxim...@ovn.org : Can you please review the patch set. I have it on my ToDo list for this week. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC] jsonrpc: make jsonrpc input_buffer size parametric
Allow jsonrpc clients (e.g. ovn-controller) to specify jsonrpc input buffer size in order to reduce overhead when downloading huge db size since current value is 512B. The user can specify rpc buffer size using ovsdb_idl_set_remote routine passing requested value Signed-off-by: Lorenzo Bianconi --- lib/jsonrpc.c | 23 --- lib/jsonrpc.h | 5 +++-- lib/netdev-dummy.c| 5 +++-- lib/ovsdb-idl.c | 13 + lib/ovsdb-idl.h | 2 +- lib/stream-fd.c | 4 ++-- lib/stream-fd.h | 2 +- lib/stream-provider.h | 5 +++-- lib/stream-ssl.c | 11 ++- lib/stream-tcp.c | 12 +++- lib/stream-unix.c | 6 +++--- lib/stream-windows.c | 5 +++-- lib/stream.c | 12 +++- lib/stream.h | 4 ++-- lib/unixctl.c | 3 ++- lib/vconn-stream.c| 2 +- ovsdb/ovsdb-client.c | 4 ++-- tests/test-jsonrpc.c | 4 ++-- tests/test-ovsdb.c| 2 +- tests/test-stream.c | 2 +- utilities/ovs-vsctl.c | 2 +- 21 files changed, 76 insertions(+), 52 deletions(-) diff --git a/lib/jsonrpc.c b/lib/jsonrpc.c index b9619b822..7e9aa62ae 100644 --- a/lib/jsonrpc.c +++ b/lib/jsonrpc.c @@ -33,6 +33,7 @@ #include "svec.h" #include "timeval.h" #include "openvswitch/vlog.h" +#include "stream-provider.h" VLOG_DEFINE_THIS_MODULE(jsonrpc); @@ -43,7 +44,7 @@ struct jsonrpc { /* Input. */ struct byteq input; -uint8_t input_buffer[512]; +uint8_t *input_buffer; struct json_parser *parser; /* Output. */ @@ -62,9 +63,11 @@ static void jsonrpc_error(struct jsonrpc *, int error); /* This is just the same as stream_open() except that it uses the default * JSONRPC port if none is specified. */ int -jsonrpc_stream_open(const char *name, struct stream **streamp, uint8_t dscp) +jsonrpc_stream_open(const char *name, struct stream **streamp, +uint8_t dscp, int bufsize) { -return stream_open_with_default_port(name, OVSDB_PORT, streamp, dscp); +return stream_open_with_default_port(name, OVSDB_PORT, streamp, dscp, + bufsize); } /* This is just the same as pstream_open() except that it uses the default @@ -80,14 +83,16 @@ jsonrpc_pstream_open(const char *name, struct pstream **pstreamp, uint8_t dscp) struct jsonrpc * jsonrpc_open(struct stream *stream) { +int bufsize = stream->bufsize > 0 ? stream->bufsize : 512; struct jsonrpc *rpc; ovs_assert(stream != NULL); rpc = xzalloc(sizeof *rpc); +rpc->input_buffer = xzalloc(bufsize); rpc->name = xstrdup(stream_get_name(stream)); rpc->stream = stream; -byteq_init(>input, rpc->input_buffer, sizeof rpc->input_buffer); +byteq_init(>input, rpc->input_buffer, bufsize); ovs_list_init(>output); return rpc; @@ -101,6 +106,7 @@ jsonrpc_close(struct jsonrpc *rpc) if (rpc) { jsonrpc_cleanup(rpc); free(rpc->name); +free(rpc->input_buffer); free(rpc); } } @@ -787,6 +793,7 @@ struct jsonrpc_session { int last_error; unsigned int seqno; uint8_t dscp; +int bufsize; }; static void @@ -814,11 +821,12 @@ struct jsonrpc_session * jsonrpc_session_open(const char *name, bool retry) { const struct svec remotes = { .names = (char **) , .n = 1 }; -return jsonrpc_session_open_multiple(, retry); +return jsonrpc_session_open_multiple(, retry, 512); } struct jsonrpc_session * -jsonrpc_session_open_multiple(const struct svec *remotes, bool retry) +jsonrpc_session_open_multiple(const struct svec *remotes, bool retry, + int bufsize) { struct jsonrpc_session *s; @@ -839,6 +847,7 @@ jsonrpc_session_open_multiple(const struct svec *remotes, bool retry) s->seqno = 0; s->dscp = 0; s->last_error = 0; +s->bufsize = bufsize; const char *name = reconnect_get_name(s->reconnect); if (!pstream_verify_name(name)) { @@ -931,7 +940,7 @@ jsonrpc_session_connect(struct jsonrpc_session *s) jsonrpc_session_disconnect(s); if (!reconnect_is_passive(s->reconnect)) { -error = jsonrpc_stream_open(name, >stream, s->dscp); +error = jsonrpc_stream_open(name, >stream, s->dscp, s->bufsize); if (!error) { reconnect_connecting(s->reconnect, time_msec()); } else { diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h index a44114e8d..2c77732c2 100644 --- a/lib/jsonrpc.h +++ b/lib/jsonrpc.h @@ -40,7 +40,8 @@ struct svec; #define OVSDB_OLD_PORT 6632 #define OVSDB_PORT 6640 -int jsonrpc_stream_open(const char *name, struct stream **, uint8_t dscp); +int jsonrpc_stream_open(const char *name, struct stream **, uint8_t dscp, +int bufsize); int jsonrpc_pstream_open(const char *name, struct pstream **, uint8_t dscp); struct jsonrpc *jsonrpc_open(struct stream *); @@ -105,7 +106,7 @@ char *jsonrpc_msg_to_string(const struct jsonrpc_msg *);
Re: [ovs-dev] [PATCH v11 1/3] netdev-dpdk: Reuse vhost function for dpdk ETH custom stats.
Thanks Kevin for your response. @i.maxim...@ovn.org : Can you please review the patch set. Thanks & Regards, Sriram. -Original Message- From: Kevin Traynor Sent: 05 November 2019 16:07 To: Sriram Vatala ; ovs-dev@openvswitch.org; i.maxim...@ovn.org Cc: 'Ilya Maximets' Subject: Re: [PATCH v11 1/3] netdev-dpdk: Reuse vhost function for dpdk ETH custom stats. On 05/11/2019 04:52, Sriram Vatala wrote: > Hi All, > Please consider this as a gentle remainder. > > Thanks & Regards, > Sriram. > > -Original Message- > From: Sriram Vatala > Sent: 30 October 2019 11:57 > To: 'ovs-dev@openvswitch.org' ; > 'ktray...@redhat.com' ; 'i.maxim...@ovn.org' > > Cc: 'Ilya Maximets' > Subject: RE: [PATCH v11 1/3] netdev-dpdk: Reuse vhost function for > dpdk ETH custom stats. > > Hi All, > Please review the update patch. > > Changes since v10 : > Corrected the spelling mistake in the commit message. > Hi Sriram, I've acked v10, so not planning to re-review unless there are some other changes. thanks, Kevin. p.s. In general if it is only minor changes it is usually ok to keep acks from a previous version. If it is doubtful you can check with the reviewer. > Thanks & Regards, > Sriram. > > -Original Message- > From: Sriram Vatala > Sent: 29 October 2019 20:20 > To: ovs-dev@openvswitch.org; ktray...@redhat.com; i.maxim...@ovn.org > Cc: Ilya Maximets ; Sriram Vatala > > Subject: [PATCH v11 1/3] netdev-dpdk: Reuse vhost function for dpdk > ETH custom stats. > > From: Ilya Maximets > > This is yet another refactoring for upcoming detailed drop stats. > It allows to use single function for all the software calculated > statistics in netdev-dpdk for both vhost and ETH ports. > > UINT64_MAX used as a marker for non-supported statistics in a same way > as it's done in bridge.c for common netdev stats. > > Co-authored-by: Sriram Vatala > Cc: Sriram Vatala > Signed-off-by: Ilya Maximets > Signed-off-by: Sriram Vatala > --- > lib/netdev-dpdk.c | 69 > +++ > 1 file changed, 40 insertions(+), 29 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > 04e1a2d1b..2cc2516a9 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -474,6 +474,8 @@ struct netdev_rxq_dpdk { static void > netdev_dpdk_destruct(struct netdev *netdev); static void > netdev_dpdk_vhost_destruct(struct netdev *netdev); > > +static int netdev_dpdk_get_sw_custom_stats(const struct netdev *, > + struct netdev_custom_stats > +*); > static void netdev_dpdk_clear_xstats(struct netdev_dpdk *dev); > > int netdev_dpdk_get_vid(const struct netdev_dpdk *dev); @@ -1174,7 > +1176,7 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, > dev->rte_xstats_ids = NULL; > dev->rte_xstats_ids_size = 0; > > -dev->tx_retries = 0; > +dev->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX; > > return 0; > } > @@ -2775,7 +2777,9 @@ netdev_dpdk_get_custom_stats(const struct netdev > *netdev, > > uint32_t i; > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > -int rte_xstats_ret; > +int rte_xstats_ret, sw_stats_size; > + > +netdev_dpdk_get_sw_custom_stats(netdev, custom_stats); > > ovs_mutex_lock(>mutex); > > @@ -2790,23 +2794,22 @@ netdev_dpdk_get_custom_stats(const struct > netdev *netdev, > if (rte_xstats_ret > 0 && > rte_xstats_ret <= dev->rte_xstats_ids_size) { > > -custom_stats->size = rte_xstats_ret; > -custom_stats->counters = > -(struct netdev_custom_counter *) > xcalloc(rte_xstats_ret, > -sizeof(struct netdev_custom_counter)); > +sw_stats_size = custom_stats->size; > +custom_stats->size += rte_xstats_ret; > +custom_stats->counters = xrealloc(custom_stats->counters, > + custom_stats->size * > + sizeof > + *custom_stats->counters); > > for (i = 0; i < rte_xstats_ret; i++) { > -ovs_strlcpy(custom_stats->counters[i].name, > +ovs_strlcpy(custom_stats->counters[sw_stats_size + > + i].name, > netdev_dpdk_get_xstat_name(dev, > > dev->rte_xstats_ids[i]), > NETDEV_CUSTOM_STATS_NAME_SIZE); > -custom_stats->counters[i].value = values[i]; > +custom_stats->counters[sw_stats_size + i].value = > + values[i]; > } > } else { > VLOG_WARN("Cannot get XSTATS values for port: > "DPDK_PORT_ID_FMT, >dev->port_id); > -custom_stats->counters = NULL; > -custom_stats->size = 0; > /* Let's clear statistics cache, so it will be > * reconfigured */ >
[ovs-dev] [RFC PATCH v2 ovn 5/5] NEWS: Add CoPP support.
Signed-off-by: Dumitru Ceara --- NEWS |1 + 1 file changed, 1 insertion(+) diff --git a/NEWS b/NEWS index 0ad9677..d7ef6e7 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,7 @@ Post-OVS-v2.12.0 independently. - Added IPv6 NAT support for OVN routers. - Added Stateless Floating IP support in OVN. + - Added Control Plane Protection support (control plane traffic metering). v2.12.0 - 03 Sep 2019 - ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC PATCH v2 ovn 4/5] ovn-northd: Extend metering to Controller-Events
Signed-off-by: Dumitru Ceara --- include/ovn/actions.h |1 - lib/actions.c | 50 + northd/ovn-northd.c | 15 +++ tests/ovn.at |3 +-- 4 files changed, 17 insertions(+), 52 deletions(-) diff --git a/include/ovn/actions.h b/include/ovn/actions.h index 8fc7727..9371e32 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -342,7 +342,6 @@ struct ovnact_controller_event { int event_type; /* controller event type */ struct ovnact_gen_option *options; size_t n_options; -char *meter; }; /* OVNACT_BIND_VPORT. */ diff --git a/lib/actions.c b/lib/actions.c index 32bc4d5..7708712 100644 --- a/lib/actions.c +++ b/lib/actions.c @@ -1292,9 +1292,6 @@ format_TRIGGER_EVENT(const struct ovnact_controller_event *event, { ds_put_format(s, "trigger_event(event = \"%s\"", event_to_string(event->event_type)); -if (event->meter) { -ds_put_format(s, ", meter = \"%s\"", event->meter); -} for (const struct ovnact_gen_option *o = event->options; o < >options[event->n_options]; o++) { ds_put_cstr(s, ", "); @@ -1439,24 +1436,11 @@ encode_event_empty_lb_backends_opts(struct ofpbuf *ofpacts, static void encode_TRIGGER_EVENT(const struct ovnact_controller_event *event, - const struct ovnact_encode_params *ep OVS_UNUSED, + const struct ovnact_encode_params *ep, struct ofpbuf *ofpacts) { -uint32_t meter_id = NX_CTLR_NO_METER; -size_t oc_offset; - -if (event->meter) { -meter_id = ovn_extend_table_assign_id(ep->meter_table, event->meter, - ep->lflow_uuid); -if (meter_id == EXT_TABLE_ID_INVALID) { -VLOG_WARN("Unable to assign id for trigger meter: %s", - event->meter); -return; -} -} - -oc_offset = encode_start_controller_op(ACTION_OPCODE_EVENT, false, - meter_id, ofpacts); +size_t oc_offset = encode_start_controller_op(ACTION_OPCODE_EVENT, false, + ep->ctrl_meter_id, ofpacts); ovs_be32 ofs = htonl(event->event_type); ofpbuf_put(ofpacts, , sizeof ofs); @@ -1878,27 +1862,12 @@ parse_trigger_event(struct action_context *ctx, sizeof *event->options); } -if (lexer_match_id(ctx->lexer, "meter")) { -if (!lexer_force_match(ctx->lexer, LEX_T_EQUALS)) { -return; -} -/* If multiple meters are given, use the most recent. */ -if (ctx->lexer->token.type == LEX_T_STRING && -strlen(ctx->lexer->token.s)) { -free(event->meter); -event->meter = xstrdup(ctx->lexer->token.s); -} else if (ctx->lexer->token.type != LEX_T_STRING) { -lexer_syntax_error(ctx->lexer, "expecting string"); -return; -} -lexer_get(ctx->lexer); -} else { -struct ovnact_gen_option *o = >options[event->n_options++]; -memset(o, 0, sizeof *o); -parse_gen_opt(ctx, o, ->pp->controller_event_opts->event_opts[event_type], -event_to_string(event_type)); -} +struct ovnact_gen_option *o = >options[event->n_options++]; +memset(o, 0, sizeof *o); +parse_gen_opt(ctx, o, + >pp->controller_event_opts->event_opts[event_type], + event_to_string(event_type)); + if (ctx->lexer->error) { return; } @@ -1919,7 +1888,6 @@ static void ovnact_controller_event_free(struct ovnact_controller_event *event) { free_gen_options(event->options, event->n_options); -free(event->meter); } static void diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index ec45021..4e53690 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -4288,11 +4288,7 @@ build_empty_lb_event_flow(struct ovn_datapath *od, struct hmap *lflows, } struct ds match = DS_EMPTY_INITIALIZER; -char *meter = "", *action; - -if (meter_groups && shash_find(meter_groups, "event-elb")) { -meter = "event-elb"; -} +char *action; if (addr_family == AF_INET) { ds_put_format(, "ip4.dst == %s && %s", @@ -4306,13 +4302,16 @@ build_empty_lb_event_flow(struct ovn_datapath *od, struct hmap *lflows, port); } action = xasprintf("trigger_event(event = \"%s\", " - "meter = \"%s\", vip = \"%s\", " + "vip = \"%s\", " "protocol = \"%s\", " "load_balancer = \"" UUID_FMT "\");", event_to_string(OVN_EVENT_EMPTY_LB_BACKENDS), -
[ovs-dev] [RFC PATCH v2 ovn 3/5] ovn-northd: Add CoPP policies for flows that punt packets to ovn-controller.
Change the ovn-northd implementation to set the new 'controller_meter' field for flows that need to punt packets to ovn-controller. For protocols that get per port logical flows, any potential per port CoPP policies take precedence over switch/router CoPP policies. Protocol packets for which CoPP is enforced when sending packets to ovn-controller (if configured): - ARP - ND_NS - ND_NA - ND_RA - DNS - IGMP - packets that require ARP resolution before forwarding - packets that require ND_NS before forwarding - packets that need to be replied to with ICMP Errors - packets that need to be replied to with TCP RST - packets that need to be replied to with DHCP_OPTS Signed-off-by: Dumitru Ceara --- northd/ovn-northd.c | 242 +++ 1 file changed, 164 insertions(+), 78 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 4808299..ec45021 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -4448,8 +4448,9 @@ build_acl_log(struct ds *actions, const struct nbrec_acl *acl) static void build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows, - enum ovn_stage stage, struct nbrec_acl *acl, - struct ds *extra_match, struct ds *extra_actions) + struct shash *meter_groups, enum ovn_stage stage, + struct nbrec_acl *acl, struct ds *extra_match, + struct ds *extra_actions) { struct ds match = DS_EMPTY_INITIALIZER; struct ds actions = DS_EMPTY_INITIALIZER; @@ -4465,8 +4466,11 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows, "eth.dst <-> eth.src; ip4.dst <-> ip4.src; " "tcp_reset { outport <-> inport; %s };", ingress ? "output;" : "next(pipeline=ingress,table=0);"); -ovn_lflow_add(lflows, od, stage, acl->priority + OVN_ACL_PRI_OFFSET + 10, - ds_cstr(), ds_cstr()); +ovn_lflow_add_ctrl(lflows, od, stage, + acl->priority + OVN_ACL_PRI_OFFSET + 10, + ds_cstr(), ds_cstr(), + copp_meter_get(COPP_TCP_RESET, od->nbs->copp, + meter_groups)); ds_clear(); ds_clear(); build_acl_log(, acl); @@ -4478,8 +4482,11 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows, "eth.dst <-> eth.src; ip6.dst <-> ip6.src; " "tcp_reset { outport <-> inport; %s };", ingress ? "output;" : "next(pipeline=ingress,table=0);"); -ovn_lflow_add(lflows, od, stage, acl->priority + OVN_ACL_PRI_OFFSET + 10, - ds_cstr(), ds_cstr()); +ovn_lflow_add_ctrl(lflows, od, stage, + acl->priority + OVN_ACL_PRI_OFFSET + 10, + ds_cstr(), ds_cstr(), + copp_meter_get(COPP_TCP_RESET, od->nbs->copp, + meter_groups)); /* IP traffic */ ds_clear(); @@ -4496,8 +4503,10 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows, "eth.dst <-> eth.src; ip4.dst <-> ip4.src; " "icmp4 { outport <-> inport; %s };", ingress ? "output;" : "next(pipeline=ingress,table=0);"); -ovn_lflow_add(lflows, od, stage, acl->priority + OVN_ACL_PRI_OFFSET, - ds_cstr(), ds_cstr()); +ovn_lflow_add_ctrl(lflows, od, stage, acl->priority + OVN_ACL_PRI_OFFSET, + ds_cstr(), ds_cstr(), + copp_meter_get(COPP_ICMP4_ERR, od->nbs->copp, + meter_groups)); ds_clear(); ds_clear(); build_acl_log(, acl); @@ -4512,16 +4521,19 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows, "eth.dst <-> eth.src; ip6.dst <-> ip6.src; " "outport <-> inport; %s };", ingress ? "output;" : "next(pipeline=ingress,table=0);"); -ovn_lflow_add(lflows, od, stage, acl->priority + OVN_ACL_PRI_OFFSET, - ds_cstr(), ds_cstr()); +ovn_lflow_add_ctrl(lflows, od, stage, acl->priority + OVN_ACL_PRI_OFFSET, + ds_cstr(), ds_cstr(), + copp_meter_get(COPP_ICMP6_ERR, od->nbs->copp, + meter_groups)); ds_destroy(); ds_destroy(); } static void -consider_acl(struct hmap *lflows, struct ovn_datapath *od, - struct nbrec_acl *acl, bool has_stateful) +consider_acl(struct hmap *lflows, struct shash *meter_groups, + struct ovn_datapath *od, struct nbrec_acl *acl, + bool has_stateful) { bool ingress = !strcmp(acl->direction, "from-lport") ? true :false; enum ovn_stage stage = ingress ? S_SWITCH_IN_ACL : S_SWITCH_OUT_ACL; @@ -4610,8 +4622,8 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
[ovs-dev] [RFC PATCH v2 ovn 2/5] ovn-northd: Add support for CoPP.
Add new 'Copp' (Control plane protection) table to OVN Northbound DB: - this stores mappings between control plane protocol names and meters that should be used to rate limit controller-destined traffic for those protocols. Add new 'copp' columns to the following OVN Northbound DB tables: - Logical_Switch - Logical_Switch_Port - Logical_Router - Logical_Router_Port This allows defining control plane policies with different granularities. For example a user can decide to enforce a general policy for the logical switch but at the same time configure a different policy on some of the ports of the logical switch. Control plane protocol policies applied to a logical port take precedence over the ones defined at logical switch level. For logical routers and logical router ports we take the same approach. For now, no control plane protection policy is installed for any of the existing flows that punt packets to ovn-controller. This will be added in follow-up patches. Add CLI commands in 'ovn-nbctl' to allow the user to manage Control Plane Protection Policies at different levels (logical switch, logical router, logical port). Signed-off-by: Dumitru Ceara --- lib/automake.mk |2 lib/copp.c| 99 +++ lib/copp.h| 58 ++ northd/ovn-northd.c | 43 +++-- ovn-nb.ovsschema | 24 ++- ovn-nb.xml| 91 ++ utilities/ovn-nbctl.8.xml | 94 ++ utilities/ovn-nbctl.c | 412 + 8 files changed, 808 insertions(+), 15 deletions(-) create mode 100644 lib/copp.c create mode 100644 lib/copp.h diff --git a/lib/automake.mk b/lib/automake.mk index 0c8245c..7ba7ca0 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -9,6 +9,8 @@ lib_libovn_la_SOURCES = \ lib/actions.c \ lib/chassis-index.c \ lib/chassis-index.h \ + lib/copp.c \ + lib/copp.h \ lib/ovn-dirs.h \ lib/expr.c \ lib/extend-table.h \ diff --git a/lib/copp.c b/lib/copp.c new file mode 100644 index 000..820cc29 --- /dev/null +++ b/lib/copp.c @@ -0,0 +1,99 @@ +/* Copyright (c) 2019, Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include + +#include "openvswitch/shash.h" +#include "smap.h" +#include "lib/ovn-nb-idl.h" +#include "lib/copp.h" + +static char *copp_proto_names[COPP_PROTO_MAX] = { +[COPP_ARP] = "arp", +[COPP_ARP_RESOLVE] = "arp-resolve", +[COPP_DHCPV4_OPTS] = "dhcpv4-opts", +[COPP_DHCPV6_OPTS] = "dhcpv6-opts", +[COPP_DNS] = "dns", +[COPP_EVENT_ELB] = "event-elb", +[COPP_ICMP4_ERR] = "icmp4-error", +[COPP_ICMP6_ERR] = "icmp6-error", +[COPP_IGMP] = "igmp", +[COPP_ND_NA] = "nd-na", +[COPP_ND_NS] = "nd-ns", +[COPP_ND_NS_RESOLVE] = "nd-ns-resolve", +[COPP_ND_RA_OPTS]= "nd-ra-opts", +[COPP_TCP_RESET] = "tcp-reset", +}; + +static bool copp_port_support[COPP_PROTO_MAX] = { +[COPP_DHCPV4_OPTS] = true, +[COPP_DHCPV6_OPTS] = true, +[COPP_ICMP4_ERR] = true, +[COPP_ICMP6_ERR] = true, +[COPP_ND_RA_OPTS] = true, +[COPP_TCP_RESET] = true, +}; + +/* Return true if the copp meter can be configured on a logical port. Return + * false if the meter is only supported on a logical switch/router. + */ +bool copp_port_meter_supported(enum copp_proto proto) +{ +if (proto >= COPP_PROTO_MAX) { +return false; +} + +return copp_port_support[proto]; +} + +const char * +copp_proto_get(enum copp_proto proto) +{ +if (proto >= COPP_PROTO_MAX) { +return ""; +} +return copp_proto_names[proto]; +} + +const char * +copp_meter_get(enum copp_proto proto, const struct nbrec_copp *copp, + const struct shash *meter_groups) +{ +if (!copp || proto >= COPP_PROTO_MAX) { +return NULL; +} + +const char *meter = smap_get(>meters, copp_proto_names[proto]); + +if (meter && shash_find(meter_groups, meter)) { +return meter; +} + +return NULL; +} + +const char * +copp_port_meter_get(enum copp_proto proto, const struct nbrec_copp *port_copp, +const struct nbrec_copp *dp_copp, +const struct shash *meter_groups) +{ +const char *meter = copp_meter_get(proto, port_copp, meter_groups); + +if (!meter) { +return copp_meter_get(proto, dp_copp,
[ovs-dev] [RFC PATCH v2 ovn 1/5] ovn-controller: Add support for Logical_Flow control meters.
Add a new 'controller_meter' column to OVN Southbound Logical_Flow table. This stores an optional string which should correspond to the Meter that must be used for rate limiting controller actions generated by packets hitting the flow. Add a new 'ofctrl_add_flow_meter' function to create a new 'ovn_flow' with an attached controller meter. Change ofctrl_check_and_add_flow to allow specifying a meter ID for packets that are punted to controller. Change consider_logical_flow to parse controller_meter from the logical flow and use it when building openflow entries. Add a new 'ctrl_meter_id' field to 'struct ovnact_encode_params' to be used when encoding controller actions from logical flow actions. Signed-off-by: Dumitru Ceara --- controller/lflow.c| 37 ++ controller/ofctrl.c | 29 +- controller/ofctrl.h | 13 ++-- controller/physical.c |3 ++- include/ovn/actions.h |2 ++ lib/actions.c | 54 + ovn-sb.ovsschema |6 - ovn-sb.xml|6 + 8 files changed, 109 insertions(+), 41 deletions(-) diff --git a/controller/lflow.c b/controller/lflow.c index 36150bd..6d2ddf7 100644 --- a/controller/lflow.c +++ b/controller/lflow.c @@ -536,6 +536,26 @@ update_conj_id_ofs(uint32_t *conj_id_ofs, uint32_t n_conjs) return true; } +static void +lflow_parse_ctrl_meter(const struct sbrec_logical_flow *lflow, + struct ovn_extend_table *meter_table, + uint32_t *meter_id) +{ +*meter_id = NX_CTLR_NO_METER; + +if (lflow->controller_meter) { +*meter_id = ovn_extend_table_assign_id(meter_table, + lflow->controller_meter, + lflow->header_.uuid); +if (*meter_id == EXT_TABLE_ID_INVALID) { +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); +VLOG_WARN_RL(, "Unable to assign id for meter: %s", + lflow->controller_meter); +return; +} +} +} + static bool consider_logical_flow( struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath, @@ -669,6 +689,12 @@ consider_logical_flow( return true; } +/* Parse any meter to be used if this flow should punt packets to + * controller. + */ +uint32_t ctrl_meter_id = NX_CTLR_NO_METER; +lflow_parse_ctrl_meter(lflow, meter_table, _meter_id); + /* Encode OVN logical actions into OpenFlow. */ uint64_t ofpacts_stub[1024 / 8]; struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub); @@ -686,6 +712,7 @@ consider_logical_flow( .output_ptable = output_ptable, .mac_bind_ptable = OFTABLE_MAC_BINDING, .mac_lookup_ptable = OFTABLE_MAC_LOOKUP, +.ctrl_meter_id = ctrl_meter_id, }; ovnacts_encode(ovnacts.data, ovnacts.size, , ); ovnacts_free(ovnacts.data, ovnacts.size); @@ -717,9 +744,10 @@ consider_logical_flow( } } if (!m->n) { -ofctrl_add_flow(flow_table, ptable, lflow->priority, -lflow->header_.uuid.parts[0], >match, , ->header_.uuid); +ofctrl_add_flow_meter(flow_table, ptable, lflow->priority, + lflow->header_.uuid.parts[0], >match, + , >header_.uuid, + ctrl_meter_id); } else { uint64_t conj_stubs[64 / 8]; struct ofpbuf conj; @@ -736,7 +764,8 @@ consider_logical_flow( } ofctrl_add_or_append_flow(flow_table, ptable, lflow->priority, 0, - >match, , >header_.uuid); + >match, , >header_.uuid, + ctrl_meter_id); ofpbuf_uninit(); } } diff --git a/controller/ofctrl.c b/controller/ofctrl.c index 10edd84..81d001f 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -67,13 +67,15 @@ struct ovn_flow { struct ofpact *ofpacts; size_t ofpacts_len; uint64_t cookie; +uint32_t ctrl_meter_id; /* Meter to be used for controller actions. */ }; static struct ovn_flow *ovn_flow_alloc(uint8_t table_id, uint16_t priority, uint64_t cookie, const struct match *match, const struct ofpbuf *actions, - const struct uuid *sb_uuid); + const struct uuid *sb_uuid, + uint32_t meter_id); static uint32_t ovn_flow_match_hash(const struct ovn_flow *); static struct ovn_flow *ovn_flow_lookup(struct hmap *flow_table,
[ovs-dev] [RFC PATCH v2 ovn 0/5] Add CoPP (Control Plane Protection).
This series adds support for user configured control plane protection policies. Such policies are implemented through OVS meters and are useful for protecting ovn-controller from being overloaded by control traffic (any type of traffic that requires ovn-controller additional processing). First, logical flows are extended to allow ovn-northd to refer to a specific Meter that would be used when traffic matching logical flows is punted to ovn-controller. The following commit builds the infrastructure required for configuring control plane policies and adds code to ovn-northd to allow creation of logical flows that have an associated control meter. Then CoPP is implemented for all types of traffic that currently gets punted to ovn-controller. CoPP can be applied at different levels: logical switch port, logical router port, logical switch, logical router. Whenever a CoPP policy is configured for a logical port, it will take precedence over the policy configured at router/switch level. However, per port CoPP policies are allowed only for types of traffic that are currently handled by ovn-northd through logical flows that also match on "inport". This could be further refined by follow-up commits. Post-RFC remaining items: - add autotests for CoPP Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/362732.html CC: Han Zhou CC: Numan Siddique > Signed-off-by: Dumitru Ceara Dumitru Ceara (5): ovn-controller: Add support for Logical_Flow control meters. ovn-northd: Add support for CoPP. ovn-northd: Add CoPP policies for flows that punt packets to ovn-controller. ovn-northd: Extend metering to Controller-Events NEWS: Add CoPP support. NEWS |1 controller/lflow.c| 37 controller/ofctrl.c | 29 +++ controller/ofctrl.h | 13 + controller/physical.c |3 include/ovn/actions.h |3 lib/actions.c | 104 --- lib/automake.mk |2 lib/copp.c| 99 +++ lib/copp.h| 58 ++ northd/ovn-northd.c | 300 ++--- ovn-nb.ovsschema | 24 ++- ovn-nb.xml| 91 ++ ovn-sb.ovsschema |6 - ovn-sb.xml|6 + tests/ovn.at |3 utilities/ovn-nbctl.8.xml | 94 ++ utilities/ovn-nbctl.c | 412 + 18 files changed, 1099 insertions(+), 186 deletions(-) create mode 100644 lib/copp.c create mode 100644 lib/copp.h --- v2: - Address Han's comment and split the RFC in a series. - Simplify a bit the logic of applying CoPP on logical ports. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 ovn 0/5] Add CoPP (Control Plane Protection).
On Tue, Nov 5, 2019 at 1:47 PM Dumitru Ceara wrote: > > This series adds support for user configured control plane protection > policies. Such policies are implemented through OVS meters and are > useful for protecting ovn-controller from being overloaded by control > traffic (any type of traffic that requires ovn-controller additional > processing). > > First, logical flows are extended to allow ovn-northd to refer to a > specific Meter that would be used when traffic matching logical flows > is punted to ovn-controller. > > The following commit builds the infrastructure required for configuring > control plane policies and adds code to ovn-northd to allow creation > of logical flows that have an associated control meter. > > Then CoPP is implemented for all types of traffic that currently gets > punted to ovn-controller. > > CoPP can be applied at different levels: logical switch port, logical > router port, logical switch, logical router. Whenever a CoPP policy > is configured for a logical port, it will take precedence over the > policy configured at router/switch level. However, per port CoPP > policies are allowed only for types of traffic that are currently > handled by ovn-northd through logical flows that also match on > "inport". This could be further refined by follow-up commits. > > Post-RFC remaining items: > - add autotests for CoPP Please ignore this, it was supposed to be sent as RFC. Sorry for the noise, Dumitru > > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/362732.html > CC: Han Zhou > CC: Numan Siddique > > Signed-off-by: Dumitru Ceara > > Dumitru Ceara (5): > ovn-controller: Add support for Logical_Flow control meters. > ovn-northd: Add support for CoPP. > ovn-northd: Add CoPP policies for flows that punt packets to > ovn-controller. > ovn-northd: Extend metering to Controller-Events > NEWS: Add CoPP support. > > > NEWS |1 > controller/lflow.c| 37 > controller/ofctrl.c | 29 +++ > controller/ofctrl.h | 13 + > controller/physical.c |3 > include/ovn/actions.h |3 > lib/actions.c | 104 --- > lib/automake.mk |2 > lib/copp.c| 99 +++ > lib/copp.h| 58 ++ > northd/ovn-northd.c | 300 ++--- > ovn-nb.ovsschema | 24 ++- > ovn-nb.xml| 91 ++ > ovn-sb.ovsschema |6 - > ovn-sb.xml|6 + > tests/ovn.at |3 > utilities/ovn-nbctl.8.xml | 94 ++ > utilities/ovn-nbctl.c | 412 > + > 18 files changed, 1099 insertions(+), 186 deletions(-) > create mode 100644 lib/copp.c > create mode 100644 lib/copp.h > > > --- > v2: > - Address Han's comment and split the RFC in a series. > - Simplify a bit the logic of applying CoPP on logical ports. > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2 ovn 0/5] Add CoPP (Control Plane Protection).
This series adds support for user configured control plane protection policies. Such policies are implemented through OVS meters and are useful for protecting ovn-controller from being overloaded by control traffic (any type of traffic that requires ovn-controller additional processing). First, logical flows are extended to allow ovn-northd to refer to a specific Meter that would be used when traffic matching logical flows is punted to ovn-controller. The following commit builds the infrastructure required for configuring control plane policies and adds code to ovn-northd to allow creation of logical flows that have an associated control meter. Then CoPP is implemented for all types of traffic that currently gets punted to ovn-controller. CoPP can be applied at different levels: logical switch port, logical router port, logical switch, logical router. Whenever a CoPP policy is configured for a logical port, it will take precedence over the policy configured at router/switch level. However, per port CoPP policies are allowed only for types of traffic that are currently handled by ovn-northd through logical flows that also match on "inport". This could be further refined by follow-up commits. Post-RFC remaining items: - add autotests for CoPP Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/362732.html CC: Han Zhou CC: Numan Siddique > Signed-off-by: Dumitru Ceara Dumitru Ceara (5): ovn-controller: Add support for Logical_Flow control meters. ovn-northd: Add support for CoPP. ovn-northd: Add CoPP policies for flows that punt packets to ovn-controller. ovn-northd: Extend metering to Controller-Events NEWS: Add CoPP support. NEWS |1 controller/lflow.c| 37 controller/ofctrl.c | 29 +++ controller/ofctrl.h | 13 + controller/physical.c |3 include/ovn/actions.h |3 lib/actions.c | 104 --- lib/automake.mk |2 lib/copp.c| 99 +++ lib/copp.h| 58 ++ northd/ovn-northd.c | 300 ++--- ovn-nb.ovsschema | 24 ++- ovn-nb.xml| 91 ++ ovn-sb.ovsschema |6 - ovn-sb.xml|6 + tests/ovn.at |3 utilities/ovn-nbctl.8.xml | 94 ++ utilities/ovn-nbctl.c | 412 + 18 files changed, 1099 insertions(+), 186 deletions(-) create mode 100644 lib/copp.c create mode 100644 lib/copp.h --- v2: - Address Han's comment and split the RFC in a series. - Simplify a bit the logic of applying CoPP on logical ports. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [OVN][RAFT]Why left server cannot be added back to
Hi,Numan, When I run OVN/RAFT cluster, I found a server(which initiative to leave or be kicked off ) ,cannot be added back to origin cluster. I found the code as following, can you tell me the reason , many thanks! case RAFT_REC_NOTE: if (!strcmp(r->note, "left")) { return ovsdb_error(NULL, "record %llu indicates server has left " "the cluster; it cannot be added back (use " "\"ovsdb-tool join-cluster\" to add a new " "server)", rec_idx); Thanks, Yun ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [OVN][RAFT]Why left server cannot be added back to
On Tue, Nov 5, 2019 at 5:40 PM taoyunupt wrote: > > Hi,Numan, > When I run OVN/RAFT cluster, I found a server(which > initiative to leave or be kicked off ) ,cannot be added back to origin > cluster. I found the code as following, can you tell me the reason , many > thanks! > > case RAFT_REC_NOTE: > if (!strcmp(r->note, "left")) { > return ovsdb_error(NULL, "record %llu indicates server has left " >"the cluster; it cannot be added back (use " >"\"ovsdb-tool join-cluster\" to add a new " >"server)", rec_idx); > Thanks, > Yun Hi Yun, Probably Ben or Han can answer this question. Did you look into the documentation about raft in ovs man pages. Maybe you can find the reason for this . Thanks Numan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Подготовим достоверную базу организаций
Приветствуем вас Мы специалисты в подготовке любых баз данных компаний по любой отрасли в любом городе. Рекомендуем новую базу данных потенциальных клиентов. Можно сделать выборку по категории, городу или заказать базу компаний по всей Российской Федарации. Эта база предприятий включает все работающие компании РФ. Оставьте свои контакты тут и мы вам перезвоним http://inbase.online ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v11 1/3] netdev-dpdk: Reuse vhost function for dpdk ETH custom stats.
On 05/11/2019 04:52, Sriram Vatala wrote: > Hi All, > Please consider this as a gentle remainder. > > Thanks & Regards, > Sriram. > > -Original Message- > From: Sriram Vatala > Sent: 30 October 2019 11:57 > To: 'ovs-dev@openvswitch.org' ; > 'ktray...@redhat.com' ; 'i.maxim...@ovn.org' > > Cc: 'Ilya Maximets' > Subject: RE: [PATCH v11 1/3] netdev-dpdk: Reuse vhost function for dpdk ETH > custom stats. > > Hi All, > Please review the update patch. > > Changes since v10 : > Corrected the spelling mistake in the commit message. > Hi Sriram, I've acked v10, so not planning to re-review unless there are some other changes. thanks, Kevin. p.s. In general if it is only minor changes it is usually ok to keep acks from a previous version. If it is doubtful you can check with the reviewer. > Thanks & Regards, > Sriram. > > -Original Message- > From: Sriram Vatala > Sent: 29 October 2019 20:20 > To: ovs-dev@openvswitch.org; ktray...@redhat.com; i.maxim...@ovn.org > Cc: Ilya Maximets ; Sriram Vatala > > Subject: [PATCH v11 1/3] netdev-dpdk: Reuse vhost function for dpdk ETH > custom stats. > > From: Ilya Maximets > > This is yet another refactoring for upcoming detailed drop stats. > It allows to use single function for all the software calculated statistics > in netdev-dpdk for both vhost and ETH ports. > > UINT64_MAX used as a marker for non-supported statistics in a same way as > it's done in bridge.c for common netdev stats. > > Co-authored-by: Sriram Vatala > Cc: Sriram Vatala > Signed-off-by: Ilya Maximets > Signed-off-by: Sriram Vatala > --- > lib/netdev-dpdk.c | 69 +++ > 1 file changed, 40 insertions(+), 29 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > 04e1a2d1b..2cc2516a9 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -474,6 +474,8 @@ struct netdev_rxq_dpdk { static void > netdev_dpdk_destruct(struct netdev *netdev); static void > netdev_dpdk_vhost_destruct(struct netdev *netdev); > > +static int netdev_dpdk_get_sw_custom_stats(const struct netdev *, > + struct netdev_custom_stats > +*); > static void netdev_dpdk_clear_xstats(struct netdev_dpdk *dev); > > int netdev_dpdk_get_vid(const struct netdev_dpdk *dev); @@ -1174,7 +1176,7 > @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, > dev->rte_xstats_ids = NULL; > dev->rte_xstats_ids_size = 0; > > -dev->tx_retries = 0; > +dev->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX; > > return 0; > } > @@ -2775,7 +2777,9 @@ netdev_dpdk_get_custom_stats(const struct netdev > *netdev, > > uint32_t i; > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > -int rte_xstats_ret; > +int rte_xstats_ret, sw_stats_size; > + > +netdev_dpdk_get_sw_custom_stats(netdev, custom_stats); > > ovs_mutex_lock(>mutex); > > @@ -2790,23 +2794,22 @@ netdev_dpdk_get_custom_stats(const struct netdev > *netdev, > if (rte_xstats_ret > 0 && > rte_xstats_ret <= dev->rte_xstats_ids_size) { > > -custom_stats->size = rte_xstats_ret; > -custom_stats->counters = > -(struct netdev_custom_counter *) > xcalloc(rte_xstats_ret, > -sizeof(struct netdev_custom_counter)); > +sw_stats_size = custom_stats->size; > +custom_stats->size += rte_xstats_ret; > +custom_stats->counters = xrealloc(custom_stats->counters, > + custom_stats->size * > + sizeof > + *custom_stats->counters); > > for (i = 0; i < rte_xstats_ret; i++) { > -ovs_strlcpy(custom_stats->counters[i].name, > +ovs_strlcpy(custom_stats->counters[sw_stats_size + > + i].name, > netdev_dpdk_get_xstat_name(dev, > > dev->rte_xstats_ids[i]), > NETDEV_CUSTOM_STATS_NAME_SIZE); > -custom_stats->counters[i].value = values[i]; > +custom_stats->counters[sw_stats_size + i].value = > + values[i]; > } > } else { > VLOG_WARN("Cannot get XSTATS values for port: > "DPDK_PORT_ID_FMT, >dev->port_id); > -custom_stats->counters = NULL; > -custom_stats->size = 0; > /* Let's clear statistics cache, so it will be > * reconfigured */ > netdev_dpdk_clear_xstats(dev); @@ -2821,39 +2824,47 @@ > netdev_dpdk_get_custom_stats(const struct netdev *netdev, } > > static int > -netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev, > - struct netdev_custom_stats > *custom_stats) > +netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev, > +struct
[ovs-dev] [PATCH ovn v2 3/3] Send service monitor health checks
From: Numan Siddique ovn-controller will periodically sends out the service monitor packets for the services configured in the SB DB Service_Monitor table. This patch makes use of the action - handle_svc_check to handle the service monitor reply packets from the service. This patch supports IPv4 TCP and UDP service monitoring. For TCP services, it sends out a TCP SYN packet and expects TCP ACK packet in response. If the response is received on time, the status of the service is set to "online", otherwise it is set to "offline". For UDP services, it sends out a empty UDP packet and doesn't expect any reply. In case the service is down, the host running the service, sends out ICMP type 3 code 4 (destination unreachable) packet. If ovn-controller receives this ICMP packet, it sets the status of the service to "offline". Right now only IPv4 service monitoring is supported. An upcoming patch will add the support for IPv6. Acked-by: Mark Michelson Signed-off-by: Numan Siddique --- controller/ovn-controller.c | 2 + controller/pinctrl.c | 775 -- controller/pinctrl.h | 2 + northd/ovn-northd.8.xml | 10 + northd/ovn-northd.c | 18 + tests/ovn.at | 119 ++ tests/system-common-macros.at | 1 + tests/system-ovn.at | 180 8 files changed, 1081 insertions(+), 26 deletions(-) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 9ab98be5c..27cb4885b 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -2083,6 +2083,8 @@ main(int argc, char *argv[]) sbrec_dns_table_get(ovnsb_idl_loop.idl), sbrec_controller_event_table_get( ovnsb_idl_loop.idl), +sbrec_service_monitor_table_get( +ovnsb_idl_loop.idl), br_int, chassis, _runtime_data.local_datapaths, _runtime_data.active_tunnels); diff --git a/controller/pinctrl.c b/controller/pinctrl.c index a90ee73d6..8fc31d38a 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -38,6 +38,7 @@ #include "openvswitch/ofp-switch.h" #include "openvswitch/ofp-util.h" #include "openvswitch/vlog.h" +#include "lib/random.h" #include "lib/dhcp.h" #include "ovn-controller.h" @@ -282,6 +283,22 @@ static void run_put_vport_bindings( static void wait_put_vport_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn); static void pinctrl_handle_bind_vport(const struct flow *md, struct ofpbuf *userdata); +static void pinctrl_handle_svc_check(struct rconn *swconn, + const struct flow *ip_flow, + struct dp_packet *pkt_in, + const struct match *md); +static void init_svc_monitors(void); +static void destroy_svc_monitors(void); +static void sync_svc_monitors( +struct ovsdb_idl_txn *ovnsb_idl_txn, +const struct sbrec_service_monitor_table *svc_mon_table, +struct ovsdb_idl_index *sbrec_port_binding_by_name, +const struct sbrec_chassis *our_chassis) +OVS_REQUIRES(pinctrl_mutex); +static void svc_monitors_run(struct rconn *swconn, + long long int *svc_monitors_next_run_time) +OVS_REQUIRES(pinctrl_mutex); +static void svc_monitors_wait(long long int svc_monitors_next_run_time); COVERAGE_DEFINE(pinctrl_drop_put_mac_binding); COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map); @@ -444,6 +461,7 @@ pinctrl_init(void) init_event_table(); ip_mcast_snoop_init(); init_put_vport_bindings(); +init_svc_monitors(); pinctrl.br_int_name = NULL; pinctrl_handler_seq = seq_create(); pinctrl_main_seq = seq_create(); @@ -1981,6 +1999,13 @@ process_packet_in(struct rconn *swconn, const struct ofp_header *msg) ovs_mutex_unlock(_mutex); break; +case ACTION_OPCODE_HANDLE_SVC_CHECK: +ovs_mutex_lock(_mutex); +pinctrl_handle_svc_check(swconn, , , + _metadata); +ovs_mutex_unlock(_mutex); +break; + default: VLOG_WARN_RL(, "unrecognized packet-in opcode %"PRIu32, ntohl(ah->opcode)); @@ -2050,6 +2075,7 @@ pinctrl_handler(void *arg_) static long long int send_garp_rarp_time = LLONG_MAX; /* Next multicast query (IGMP) in ms. */ static long long int send_mcast_query_time = LLONG_MAX; +static long long int svc_monitors_next_run_time = LLONG_MAX; swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP13_VERSION); @@ -2110,11 +2136,16 @@ pinctrl_handler(void *arg_) } } +ovs_mutex_lock(_mutex); +svc_monitors_run(swconn, _monitors_next_run_time); +ovs_mutex_unlock(_mutex); +
[ovs-dev] [PATCH ovn v2 2/3] Add a new action - handle_svc_check
From: Numan Siddique This action will be used in an upcoming patch to handle the service monitor replies from ovn-controller when it sends out service monitor requests. This action gets translated to openflow controller action. Acked-by: Mark Michelson Signed-off-by: Numan Siddique --- include/ovn/actions.h | 17 - lib/actions.c | 42 ++ ovn-sb.xml| 17 + tests/ovn.at | 13 + utilities/ovn-trace.c | 3 +++ 5 files changed, 91 insertions(+), 1 deletion(-) diff --git a/include/ovn/actions.h b/include/ovn/actions.h index f4997e9c9..047a8d737 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -88,7 +88,8 @@ struct ovn_extend_table; OVNACT(OVNFIELD_LOAD, ovnact_load)\ OVNACT(CHECK_PKT_LARGER, ovnact_check_pkt_larger) \ OVNACT(TRIGGER_EVENT, ovnact_controller_event) \ -OVNACT(BIND_VPORT,ovnact_bind_vport) +OVNACT(BIND_VPORT,ovnact_bind_vport) \ +OVNACT(HANDLE_SVC_CHECK, ovnact_handle_svc_check) /* enum ovnact_type, with a member OVNACT_ for each action. */ enum OVS_PACKED_ENUM ovnact_type { @@ -352,6 +353,12 @@ struct ovnact_bind_vport { struct expr_field vport_parent; /* Logical virtual port's port name. */ }; +/* OVNACT_HANDLE_SVC_CHECK. */ +struct ovnact_handle_svc_check { +struct ovnact ovnact; +struct expr_field port; /* Logical port name. */ +}; + /* Internal use by the helpers below. */ void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); @@ -537,6 +544,14 @@ enum action_opcode { *MFF_LOG_INPORT. */ ACTION_OPCODE_BIND_VPORT, + +/* "handle_svc_check(port)"." + * + * Arguments are passed through the packet metadata and data, as follows: + * + * MFF_LOG_INPORT = port + */ +ACTION_OPCODE_HANDLE_SVC_CHECK, }; /* Header. */ diff --git a/lib/actions.c b/lib/actions.c index a999a4fda..586d7b75d 100644 --- a/lib/actions.c +++ b/lib/actions.c @@ -2814,6 +2814,46 @@ ovnact_bind_vport_free(struct ovnact_bind_vport *bp) free(bp->vport); } +static void +parse_handle_svc_check(struct action_context *ctx OVS_UNUSED) +{ + if (!lexer_force_match(ctx->lexer, LEX_T_LPAREN)) { +return; +} + +struct ovnact_handle_svc_check *svc_chk = +ovnact_put_HANDLE_SVC_CHECK(ctx->ovnacts); +action_parse_field(ctx, 0, false, _chk->port); +lexer_force_match(ctx->lexer, LEX_T_RPAREN); +} + +static void +format_HANDLE_SVC_CHECK(const struct ovnact_handle_svc_check *svc_chk, +struct ds *s) +{ +ds_put_cstr(s, "handle_svc_check("); +expr_field_format(_chk->port, s); +ds_put_cstr(s, ");"); +} + +static void +encode_HANDLE_SVC_CHECK(const struct ovnact_handle_svc_check *svc_chk, +const struct ovnact_encode_params *ep OVS_UNUSED, +struct ofpbuf *ofpacts) +{ +const struct arg args[] = { +{ expr_resolve_field(_chk->port), MFF_LOG_INPORT }, +}; +encode_setup_args(args, ARRAY_SIZE(args), ofpacts); +encode_controller_op(ACTION_OPCODE_HANDLE_SVC_CHECK, ofpacts); +encode_restore_args(args, ARRAY_SIZE(args), ofpacts); +} + +static void +ovnact_handle_svc_check_free(struct ovnact_handle_svc_check *sc OVS_UNUSED) +{ +} + /* Parses an assignment or exchange or put_dhcp_opts action. */ static void parse_set_action(struct action_context *ctx) @@ -2931,6 +2971,8 @@ parse_action(struct action_context *ctx) parse_trigger_event(ctx, ovnact_put_TRIGGER_EVENT(ctx->ovnacts)); } else if (lexer_match_id(ctx->lexer, "bind_vport")) { parse_bind_vport(ctx); +} else if (lexer_match_id(ctx->lexer, "handle_svc_check")) { +parse_handle_svc_check(ctx); } else { lexer_syntax_error(ctx->lexer, "expecting action"); } diff --git a/ovn-sb.xml b/ovn-sb.xml index 335f9031b..82167c488 100644 --- a/ovn-sb.xml +++ b/ovn-sb.xml @@ -2097,6 +2097,23 @@ tcp.flags = RST; set to P. + +handle_svc_check(P); + + +Parameters: logical port string field P. + + + +Handles the service monitor reply received from the VIF of +the logical port P. ovn-controller +periodically sends out the service monitor packets for the +services configured in the +table and this action updates the status of those services. + + + Example: handle_svc_check(inport); + diff --git a/tests/ovn.at b/tests/ovn.at index 410f4b514..b30f12c9a 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -1468,6 +1468,19 @@ bind_vport("xyzzy",; bind_vport("xyzzy", inport; Syntax error at `;' expecting `)'. +# handle_svc_check +handle_svc_check(inport); +encodes as
[ovs-dev] [PATCH ovn v2 0/3] Health check feature for Load balancer backends
From: Numan Siddique This series adds load balancer health check feature. With this ovn-controllers will periodically check the status of the backend services. Only those services which are online/active will be considered for load balancing. Right now this feature is restricted to IPv4 Load balancers only. CMS needs to enable this feature and the load balancer vips and backends should have L4 port defined. For TCP backends, the local ovn-controller which binds that service's VIF, will periodically send a SYN packet and would expect SYN-ACK response to set the status of that service to online. If no response is received within the timeout, then the service status is set to offline. For UDP backends, the local ovn-controller which binds that service's VIF, will periodically send an UDP packet and expects no reply. If no reply is received within the timeout vallue, the service status is set to online. In case the service is down, then ovn-controller expects ICMP unreachable packet and upon receiving this ICMP packets, it sets the status to offline. ovn-northd adds only those backends whose status is 'online' or empty to the ct_lb action. v1 -> v2 --- * Addressed review comment from Mark in p1. * Rebased to latest master and resolved conflicts. Numan Siddique (3): ovn-northd: Add support for Load Balancer health check Add a new action - handle_svc_check Send service monitor health checks controller/ovn-controller.c | 2 + controller/pinctrl.c | 775 -- controller/pinctrl.h | 2 + include/ovn/actions.h | 17 +- lib/actions.c | 42 ++ northd/ovn-northd.8.xml | 85 +++- northd/ovn-northd.c | 510 -- ovn-nb.ovsschema | 25 +- ovn-nb.xml| 68 +++ ovn-sb.ovsschema | 33 +- ovn-sb.xml| 102 + tests/ovn-northd.at | 215 ++ tests/ovn.at | 132 ++ tests/system-common-macros.at | 1 + tests/system-ovn.at | 180 utilities/ovn-trace.c | 3 + 16 files changed, 2119 insertions(+), 73 deletions(-) -- 2.23.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 1/3] ovn-northd: Add support for Load Balancer health check
From: Numan Siddique The present Load balancer feature in OVN provides load balancing functionality to the back end ips without checking if the chosen backend ip is reachable or not. In case a back end service is down and if that IP is chosen, then packet will be lost. This patch series adds the health check feature for these backend IPs and only active backend IPs are considered for load balancing. CMS needs to enable this functionality. In the SB DB a new table Service_Monitor is added. For every backend IP in the load balancer for which health check is configured, a new row in the Service_Monitor table is created. In the upcoming patch in this series, ovn-controller will monitor the services set in this table by generating a health check packet. Health checks are supported only for IPv4 Load balancers in this patch. Existing load balancers will be unaffected after this patch. Acked-by: Mark Michelson Signed-off-by: Numan Siddique --- northd/ovn-northd.8.xml | 75 +- northd/ovn-northd.c | 492 ovn-nb.ovsschema| 25 +- ovn-nb.xml | 68 ++ ovn-sb.ovsschema| 33 ++- ovn-sb.xml | 85 +++ tests/ovn-northd.at | 215 ++ 7 files changed, 947 insertions(+), 46 deletions(-) diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index 0a33dcd9e..2e38e2d90 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -308,6 +308,16 @@ previously created, it will be associated to the empty_lb logical flow + + This table also has a priority-110 flow with the match + eth.src == E for all logical switch + datapaths to move traffic to the next table. Where E + is the service monitor mac defined in the + colum of table. + + Ingress Table 5: Pre-stateful @@ -476,7 +486,10 @@ , where args contains comma separated IP addresses (and optional port numbers) to load balance to. The address family of the IP addresses of args is the same as the address family -of VIP +of VIP. If health check is enabled, then args +will only contain those endpoints whose service monitor status entry +in OVN_Southbound db is either online or +empty. For all the configured load balancing rules for a switch in @@ -698,6 +711,51 @@ nd_na_router { + + + For each SVC_MON_SRC_IP defined in the value of + the column of + table, priority-110 + logical flow is added with the match + arp.tpa == SVC_MON_SRC_IP +arp.op == 1 and applies the action + + + +eth.dst = eth.src; +eth.src = E; +arp.op = 2; /* ARP reply. */ +arp.tha = arp.sha; +arp.sha = E; +arp.tpa = arp.spa; +arp.spa = A; +outport = inport; +flags.loopback = 1; +output; + + + + where E is the service monitor source mac defined in + the column in the table. This mac is used as the source mac + in the service monitor packets for the load balancer endpoint IP + health checks. + + + + SVC_MON_SRC_IP is used as the source ip in the + service monitor IPv4 packets for the load balancer endpoint IP + health checks. + + + + These flows are required if an ARP request is sent for the IP + SVC_MON_SRC_IP. + + + One priority-0 fallback flow that matches all packets and advances to the next table. @@ -1086,6 +1144,16 @@ output; tracker for packet de-fragmentation. + + This table also has a priority-110 flow with the match + eth.src == E for all logical switch + datapaths to move traffic to the next table. Where E + is the service monitor mac defined in the + colum of table. + + Egress Table 1: to-lport Pre-ACLs @@ -1926,7 +1994,10 @@ icmp6 { (and optional port numbers) to load balance to. If the router is configured to force SNAT any load-balanced packets, the above action will be replaced by flags.force_snat_for_lb = 1; -ct_lb(args);. +ct_lb(args);. If health check is enabled, then +args will only contain those endpoints whose service +monitor status entry in OVN_Southbound db is +either online or empty. diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index c23c270dc..b0f513de6 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -88,6 +88,13 @@ static struct eth_addr mac_prefix; static bool controller_event_en; +/* MAC allocated for service monitor usage. Just one mac is allocated + * for this purpose and ovn-controller's on each chassis will make use + * of this mac when sending out the packets to monitor the services + * defined in Service_Monitor Southbound table.
Re: [ovs-dev] [PATCH v14] Improved Packet Drop Statistics in OVS
Thanks for the review Eelco. Regards Anju -Original Message- From: Eelco Chaudron Sent: Tuesday, November 5, 2019 1:23 PM To: Anju Thomas Cc: d...@openvswitch.org; i.maxim...@samsung.com; b...@ovn.org; Rohith Basavaraja ; Keshav Gupta Subject: Re: [PATCH v14] Improved Packet Drop Statistics in OVS On 5 Nov 2019, at 5:35, Anju Thomas wrote: > Currently OVS maintains explicit packet drop/error counters only on > port level. Packets that are dropped as part of normal OpenFlow > processing are counted in flow stats of “drop” flows or as table > misses in table stats. These can only be interpreted by controllers > that know the semantics of the configured OpenFlow pipeline. > Without that knowledge, it is impossible for an OVS user to obtain > e.g. the total number of packets dropped due to OpenFlow rules. > > Furthermore, there are numerous other reasons for which packets can be > dropped by OVS slow path that are not related to the OpenFlow > pipeline. > The generated datapath flow entries include a drop action to avoid > further expensive upcalls to the slow path, but subsequent packets > dropped by the datapath are not accounted anywhere. > > Finally, the datapath itself drops packets in certain error > situations. > Also, these drops are today not accounted for.This makes it difficult > for OVS users to monitor packet drop in an OVS instance and to alert a > management system in case of a unexpected increase of such drops. > Also OVS trouble-shooters face difficulties in analysing packet drops. > > With this patch we implement following changes to address the issues > mentioned above. > > 1. Identify and account all the silent packet drop scenarios > > 2. Display these drops in ovs-appctl coverage/show > > Co-authored-by: Rohith Basavaraja > Co-authored-by: Keshav Gupta > Signed-off-by: Anju Thomas > Signed-off-by: Rohith Basavaraja > Signed-off-by: Keshav Gupta Thanks for following this trough! Did some quick sanity tests and based on my previous review of v13 and the diff to v14 it looks good to me! Acked-by: Eelco Chaudron ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev