On 28 Jun 2022, at 10:52, Jianbo Liu wrote:
> On Tue, 2022-06-28 at 09:52 +0200, Eelco Chaudron wrote: >> >> >> On 27 Jun 2022, at 16:58, Jianbo Liu wrote: >> >>> On Mon, 2022-06-27 at 11:32 +0200, Eelco Chaudron wrote: >>>> >>>> >>>> On 21 Jun 2022, at 10:22, Jianbo Liu wrote: >>>> >>>>> On Mon, 2022-06-20 at 12:18 +0200, Eelco Chaudron wrote: >>>>>> On 27 May 2022, at 11:00, Jianbo Liu wrote: >>>>>> >>>>>>> OVS meters are created in advance and openflow rules refer >>>>>>> to >>>>>>> them >>>>>>> by >>>>>>> their unique ID. New tc_police API is used to offload them. >>>>>>> By >>>>>>> calling >>>>>>> the API, police actions are created and meters are mapped >>>>>>> to >>>>>>> them. >>>>>>> These actions then can be used in tc filter rules by the >>>>>>> index. >>>>>>> >>>>>>> Signed-off-by: Jianbo Liu <[email protected]> >>>>>>> --- >>>>>>> NEWS | 2 ++ >>>>>>> lib/dpif-netlink.c | 31 +++++++++++++++++--- >>>>>>> - >>>>>>> tests/system-offloads-traffic.at | 48 >>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>> 3 files changed, 76 insertions(+), 5 deletions(-) >>>>>>> >>>>>>> diff --git a/NEWS b/NEWS >>>>>>> index eece0d0b2..dfd659d4e 100644 >>>>>>> --- a/NEWS >>>>>>> +++ b/NEWS >>>>>>> @@ -29,6 +29,8 @@ Post-v2.17.0 >>>>>>> - Windows: >>>>>>> * Conntrack support for TCPv6, UDPv6, ICMPv6, FTPv6. >>>>>>> * IPv6 Geneve tunnel support. >>>>>>> + - Linux datapath: >>>>>>> + * Add offloading meter tc police. >>>>>>> >>>>>>> >>>>>>> v2.17.0 - 17 Feb 2022 >>>>>>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c >>>>>>> index 06e1e8ca0..0af9ee77e 100644 >>>>>>> --- a/lib/dpif-netlink.c >>>>>>> +++ b/lib/dpif-netlink.c >>>>>>> @@ -4163,11 +4163,18 @@ static int >>>>>>> dpif_netlink_meter_set(struct dpif *dpif_, >>>>>>> ofproto_meter_id >>>>>>> meter_id, >>>>>>> struct ofputil_meter_config >>>>>>> *config) >>>>>>> { >>>>>>> + int err; >>>>>>> + >>>>>>> if (probe_broken_meters(dpif_)) { >>>>>>> return ENOMEM; >>>>>>> } >>>>>>> >>>>>>> - return dpif_netlink_meter_set__(dpif_, meter_id, >>>>>>> config); >>>>>>> + err = dpif_netlink_meter_set__(dpif_, meter_id, >>>>>>> config); >>>>>>> + if (!err && netdev_is_flow_api_enabled()) { >>>>>>> + meter_offload_set(meter_id, config); >>>>>> >>>>>> Although currently we always return 0, we should still >>>>>> account >>>>>> for it >>>>>> to change in the future, so we should set err to the return >>>>>> value. >>>>>> >>>>> >>>>> If there is err from meter_offload_set, it will be passed to >>>>> the >>>>> caller >>>>> of dpif_netlink_meter_set(). I don't agree with that, because >>>>> the >>>>> caller thinks meter_set operation fail, but actually not. >>>>> Besides, >>>>> we >>>>> allow the case that dp meter_set success, but offloading fails, >>>>> so >>>>> the >>>>> return the error of meter_offload_set seems unnecessary. >>>> >>>> If this is the case, we should change the >>>> dpif_netlink_meter_set() >>>> API to return void. >>>> And add a comment to the function why it would not return an >>>> error: >>>> >>> >>> OK. >>> >>>> --- a/lib/netdev-offload.c >>>> +++ b/lib/netdev-offload.c >>>> @@ -196,7 +196,7 @@ netdev_assign_flow_api(struct netdev *netdev) >>>> return -1; >>>> } >>>> >>>> -int >>>> +void >>>> meter_offload_set(ofproto_meter_id meter_id, >>>> struct ofputil_meter_config *config) >>>> { >>>> @@ -212,8 +212,8 @@ meter_offload_set(ofproto_meter_id meter_id, >>>> } >>>> } >>>> } >>>> - >>>> - return 0; >>>> + /* Offload APIs could fail, for example, because the offload >>>> is >>>> not >>>> + * supported. This is fine, as the offload API should take >>>> care >>>> of this. */ >>>> } >>>> >>>> >>>> >>>>>> + err = meter_offload_set(meter_id, config); >>>>>> >>>>>>> + } >>>>>>> + >>>> >>>> <SNIP> >>>> >>>>>>> diff --git a/tests/system-offloads-traffic.at >>>>>>> b/tests/system- >>>>>>> offloads-traffic.at >>>>>>> index 80bc1dd5c..7ec75340f 100644 >>>>>>> --- a/tests/system-offloads-traffic.at >>>>>>> +++ b/tests/system-offloads-traffic.at >>>>>>> @@ -168,3 +168,51 @@ matchall >>>>>>> ]) >>>>>>> OVS_TRAFFIC_VSWITCHD_STOP >>>>>>> AT_CLEANUP >>>>>>> + >>>>>>> +AT_SETUP([offloads - check if meter offloading ]) >>>>>> >>>>>> Can we replace if with interface, as I keep on reading it as >>>>>> "if". >>>>>> >>>>> >>>>> Sure. >>>>> >>>>>>> +AT_KEYWORDS([meter]) >>>>>>> +AT_SKIP_IF([test $SUPPORT_TC_INGRESS_PPS = "no"]) >>>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>>> + >>>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:hw- >>>>>>> offload=true]) >>>>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 >>>>>>> pktps >>>>>>> bands=type=drop rate=1']) >>>>>>> + >>>>>>> +ADD_NAMESPACES(at_ns0, at_ns1) >>>>>>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", >>>>>>> "f0:00:00:01:01:01") >>>>>>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", >>>>>>> "f0:00:00:01:01:02") >>>>>>> + >>>>>>> +NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.2 lladdr >>>>>>> f0:00:00:01:01:02 dev p0]) >>>>>>> +NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.1 lladdr >>>>>>> f0:00:00:01:01:01 dev p1]) >>>>>>> + >>>>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 >>>>>>> "actions=normal"]) >>>>>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 10 -i 0.1 -w 2 >>>>>>> 10.1.1.2 | >>>>>>> FORMAT_PING], [0], [dnl >>>>>>> +10 packets transmitted, 10 received, 0% packet loss, time >>>>>>> 0ms >>>>>>> +]) >>>>>>> + >>>>>>> +NETNS_DAEMONIZE([at_ns1], [nc -u -l 5678 > /dev/null ], >>>>>>> [nc0.pid]) >>>>>>> + >>>>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 del-flows br0]) >>>>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 >>>>>>> "priority=10,in_port=ovs-p0,udp actions=meter:1,normal"]) >>>>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 "priority=1 >>>>>>> actions=normal"]) >>>>>>> + >>>>>>> +NS_CHECK_EXEC([at_ns0], [sleep 0.5; echo "mark" | nc -u >>>>>>> 10.1.1.2 >>>>>>> 5678 -p 6789]) >>>>>> >>>>>> Any specific reason why you need the sleep 0.5 here? Is it to >>>>>> be >>>>>> sure >>>>>> the flow is programmed? >>>>> >>>>> I remember I added this because there are failures sometimes. I >>>>> don't >>>>> know why, but obviously they are related to this patchset. So I >>>>> added >>>>> the sleep to avoid them. It's only 0.5s, should be no problem, >>>>> right? >>>> >>>> I did a lot of runs, but could not get it to fail without it. So >>>> if >>>> it fails in your case it would be good to investigate. >>> >>> I can't reproduce today, though I run many times. It's related to >>> my >>> setup, I don't test on physical machine, but a virtual machine. >>> >>>> >>>>>> If so, you might just want to do a ovs-vsctl dump-flows and >>>>>> checke >>>>>> the output? >>>>>> >>>>> >>>>> I don't understand your qustion. I checked ovs-vsctl dump-flows >>>>> below. >>> >>> If I remember correctly, the used is "never", not "0.001s". >>> >>>> >>>> I mean in the failure case without he 0.5 seconds, what do you >>>> get. >>>> >>>>>>> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep "meter" | >>>>>>> DUMP_CLEAN_SORTED], [0], [dnl >>>>>>> +in_port(2),eth(macs),eth_type(0x0800),ipv4(proto=17,frag=n >>>>>>> o), >>>>>>> packets:0, bytes:0, used:0.001s, actions:outputmeter(0),3 >>>>>>> +]) >>>>>> >>>>>> Here you verify the DP rule is inserted, but should you not >>>>>> also >>>>>> wait >>>>>> a second to make sure the meter is reset? >>>>> >>>>> OK. I will add. >>>>> >>>>>> Because in theory your could/should sent 11 packets in 1 >>>>>> second, >>>>>> so >>>>>> 10 should be dropped!? >>>>>> This is the case in the kernel environment, but with TC the >>>>>> learning >>>>>> packet is not passing trough the TC meter (this might also be >>>>>> a >>>>>> corner case we need to document). >>>>>> >>>>> >>>>> Yes, it's the reason. But where to document? >>>> >>>> Not sure where either, I guess in NEWS and maybe it’s time to add >>>> a >>>> howto/tc-offload.rst file? >>>> >>> >>> Maybe just adding comments here, to explain why it's the result. >>> And >>> it's not any related to "howto". >> >> I think this is an important difference in behaviour we should >> document for the end-user. So that’s why I think we need this in the >> documentation somewhere. >> >>>>>>> +for i in `seq 10`; do >>>>>>> +NS_CHECK_EXEC([at_ns0], [echo "mark" | nc -u 10.1.1.2 5678 >>>>>>> -p >>>>>>> 6789]) >>>>>>> +done >>>>>>> + >>>>>>> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep "meter" | >>>>>>> DUMP_CLEAN_SORTED], [0], [dnl >>>>>>> +in_port(2),eth(macs),eth_type(0x0800),ipv4(proto=17,frag=n >>>>>>> o), >>>>>>> packets:10, bytes:330, used:0.001s, >>>>>>> actions:outputmeter(0),3 >>>>>>> +]) >>>>>>> + >>>>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 meter-stats br0 | sed -e >>>>>>> 's/duration:[[0-9]].[[0-9]]*s/duration:0.001s/'], [0], [dnl >>>>>>> +OFPST_METER reply (OF1.3) (xid=0x2): >>>>>>> +meter:1 flow_count:1 packet_in_count:11 byte_in_count:377 >>>>>>> duration:0.001s bands: >>>>>> >>>>>> byte_in_count:517 is a lot larger than with kernel code, do >>>>>> we >>>>>> know >>>>>> why? We should document it. >>>>>> >>>>> >>>>> I think DP also count the bytes of mac layer. >>>> >>>> Any idea why? >>> >>> From ovs_meter_execute() in datapath/meter.c, the packet size of >>> band >>> stats is added by skb->len, which should include the len of mac >>> layer. >> >> So this needs to be fixed in the kernel. I can remember someone from >> Nvidia was already looking at this? > > Sorry, I don't know. > >> >>>>>>> +0: packet_count:9 byte_count:0 >>>>>> >>>>>> Here in theory we should report byte_count, like this: >>>>>> >>>>>> 0: packet_count:10 byte_count:470 >>>>>> >>>>>> However, TC does not support dropped byte count, only >>>>>> packet_count. >>>>>> So we should be ok for now, but we must add this limitation >>>>>> to >>>>>> the >>>>>> documentation somewhere that it's clear we will not report >>>>>> byte >>>>>> count >>>>>> with TC offload. >>>>>> >>>>>> If you run this test without HW offload enabled you can see >>>>>> the >>>>>> difference in behavior, and I think there should be none (or >>>>>> at >>>>>> least >>>>>> the corner cases should be documented). >>>>>> You could also add a "- offloads disabled" variant of this >>>>>> test, >>>>>> like >>>>>> other features do and add some additional reasoning why it's >>>>>> different there. >>>>>> >>>>> >>>>> Sure, I will add. >>>> >>>> Guess this could go to the same howto/tc-offload.rst in the >>>> limitations section? >>>> >>> >>> Maybe also add comments here, don't waste time to find info from >>> the >>> other document :) >> >> See above, end-users do not read test cases, so we need this in some >> user documentation. > > Sorry, I don't understand. Are we talking about the tests, and why do > we get these expected results regarding the numbers of bytes and > packets? Maybe adding comments here is enough, user may not notice > these differences if he doesn't run this tests, right? I’m talking about the fact that the “ovs-ofctl -O OpenFlow13 meter-stats br0” is not reporting byte count when hwoffload is enabled. This is a missing feature/difference from previous behavior and we should document this for the end-user. >>>>> >>>>>>> +]) >>>>>>> + >>>>>>> +OVS_TRAFFIC_VSWITCHD_STOP >>>>>>> +AT_CLEANUP >>>>>>> -- >>>>>>> 2.26.2 >>>>>> >>>>>> This concludes my review of v5. >>>>>> >>>>>> //Eelco >>>>>> >>>> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
