On 9 Sep 2021, at 4:35, Chris Mi wrote:
> On 9/8/2021 9:57 PM, Eelco Chaudron wrote:
>>
>> On 8 Sep 2021, at 13:52, Chris Mi wrote:
>>
>>> On 9/6/2021 5:47 PM, Eelco Chaudron wrote:
>>>> On 6 Sep 2021, at 11:14, Chris Mi wrote:
>>>>
>>>>> On 9/3/2021 8:54 PM, Eelco Chaudron wrote:
>>>>>> On 3 Sep 2021, at 14:02, Eelco Chaudron wrote:
>>>>>>
>>>>>> On 15 Jul 2021, at 8:01, Chris Mi wrote:
>>>>>>
>>>>>> This patch set adds offload support for sFlow.
>>>>>>
>>>>>> Psample is a genetlink channel for packet sampling. TC action
>>>>>> act_sample
>>>>>> uses psample to send sampled packets to userspace.
>>>>>>
>>>>>> When offloading sample action to TC, userspace creates a
>>>>>> unique ID to
>>>>>> map sFlow action and tunnel info and passes this ID to kernel
>>>>>> instead
>>>>>> of the sFlow info. psample will send this ID and sampled
>>>>>> packet to
>>>>>> userspace. Using the ID, userspace can recover the sFlow info
>>>>>> and send
>>>>>> sampled packet to the right sFlow monitoring host.
>>>>>>
>>>>>> Hi Chris,
>>>>>>
>>>>>> One thing missing from this patchset is a test case. I think we
>>>>>> should add it, as I’m going over this manually every patch
>>>>>> iteration.
>>>>>>
>>>>>> I would add the following:
>>>>>>
>>>>>> *
>>>>>>
>>>>>> Set the sample rate to 1, send 100 packets and make sure you
>>>>>> receive all of them
>>>>>>
>>>>>> o Also, verify the output of “ovs-appctl dpctl/dump-flows
>>>>>> system@ovs-system type=tc” is correct, i.e., matches the
>>>>>> kernel output
>>>>>> *
>>>>>>
>>>>>> Set the sample rate to 10, send 100 packets and make sure you
>>>>>> receive 10.
>>>>>>
>>>>>> o Also, verify the output of “ovs-appctl dpctl/dump-flows
>>>>>> system@ovs-system type=tc” is correct, i.e., matches the
>>>>>> kernel output
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Eelco
>>>>>>
>>>>>> PS: I also had a problem where only one packet got sent to the
>>>>>> collector, and then no more packets would arrive. Of course, when
>>>>>> I added some debug code, it never happened, and when removing the
>>>>>> debug code, it also worked fine :( Did you ever experience
>>>>>> something like this? I will play a bit more when reviewing
>>>>>> specific code, maybe it will happen again.
>>>>>>
>>>>>>
>>>>>> One additional thing I’m seeing is the following:
>>>>>>
>>>>>> $ ovs-appctl dpctl/dump-flows system@ovs-system type=tc
>>>>>> recirc_id(0),in_port(3),eth(src=52:54:00:88:51:38,dst=04:f4:bc:28:57:01),eth_type(0x0800),ipv4(frag=no),
>>>>>> packets:7144, bytes:600096, used:7.430s,
>>>>>> actions:sample(sample=10.0%,actions(userspace(pid=3925927229,sFlow(vid=0,pcp=0,output=2147483650),actions))),2
>>>>>>
>>>>>> Sometimes I get a rather large sFlow(output=) value in the sFlow output.
>>>>>> Converting the above to hex, 0x80000002, where I see this in
>>>>>> fix_sflow_action() as being mentioned multiple output ports? This seems
>>>>>> wrong to me as it should be 3 (as it always is in the none hw offload
>>>>>> case). This odd value is also reported to the sFlow samples.
>>>>>>
>>>>>> Unfortunately, I do not have a reproducer for this.
>>>>>>
>>>>> Actually, in the very beginning of the development when I have a lot of
>>>>> debug code, I also observed such odd port number sometimes.
>>>>> Such rule will disappear soon. So it is hard to test such corner case.
>>>>> And it doesn't affect the functionality.
>>>>> So current code didn't deal with such corner case.
>>>> I’m getting this almost every time I restart OVS and initiate a flow
>>>> (using a simple ping). This does not go away by itself, it stays until the
>>>> flow times out. So this should be investigated and fixed.
>>>>
>>>> My test setup is rather simple, just two ports, one is a VM one is a
>>>> external host, which I ping from the VM. All default config, so using the
>>>> NORMAL rule. This is my sFlow config:
>>>>
>>>> ovs-vsctl -- --id=@sflow create sflow agent=lo \
>>>> target="127.0.0.1" header=128 \
>>>> sampling=10 polling=4 \
>>>> -- set bridge ovs_pvp_br0 sflow=@sflow
>>>>
>>> When writing the test, I can reproduce it. Not sure what changed. But it
>>> happens even without offload. I don't know what need to be fixed.
>>> OVS generates the DP rule and installs it. If offload is enabled,
>>> sFlow(vid=0,pcp=0,output=2147483650) (actually the whole action) is saved in
>>> dpif_sflow_attr->action when installing the DP rule. When receiving sampled
>>> packets, tc or driver passes the gid to ovs daemon and
>>> the daemon will find the dpif_sflow_attr->action using the gid. TC didn't
>>> change it. It is up to OVS sflow engine to process it.
>> Nice! I did not see it in the none-offload case, but if you do, it might not
>> be related to your patchset and we should not worry about it for now :)
>> I’m assuming that you do get the 2147483650 value from the nlmsg attributes
>> passed to the offload handler.
>>
> Yes, I got 2147483650. But give a second look, my rule is a little different:
>
> recirc_id(0),in_port(3),eth(src=26:f2:ed:17:7d:eb,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(frag=no),
> packets:0, bytes:0, used:0.000s,
> actions:userspace(pid=4294967295,sFlow(vid=0,pcp=0,output=2147483650),actions),1,2
>
> or
>
> recirc_id(0),in_port(enp8s0f0_1),eth(src=02:25:d0:10:01:02,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(frag=no),
> packets:2, bytes:140, used:1.666s,
> actions:userspace(pid=4294967295,sFlow(vid=0,pcp=0,output=2147483650),actions),br,enp8s0f0_2
>
> It seems the above rule is expected because the output is multiple ports. But
> your rule only outputs to 1 port. So to be accurate, I haven't reproduced the
> issue you hit.
> The suspicious code is dpif_sflow_attr_equal(). In the early version, I
> compared the whole dpif_sflow_attr, but now I only compare the ufid. Maybe
> the ufid is duplicate.
> But it should not happen. Anyway, I will keep an eye on this issue. Thanks
> for your carefulness.
Here is some debug info, it seems related to an update to an existing handle,
which does not update the tc part:
Here the flow gets added:
2021-09-09T14:01:12.278Z|00001|netdev_offload_tc(handler1)|ERR|EC_DBG: Enter
b13915aa-c7a8-4574-8dcf-439f03929b8e.
2021-09-09T14:01:12.278Z|00002|netdev_offload_tc(handler1)|ERR|EC_DEBUG: port
-2147483646
2021-09-09T14:01:12.285Z|00003|netdev_offload_tc(handler1)|ERR|EC_DBG: EXIT ok
b13915aa-c7a8-4574-8dcf-439f03929b8e.
Here the revalidator updates it with new port information:
2021-09-09T14:01:12.537Z|00001|netdev_offload_tc(revalidator13)|ERR|EC_DBG:
Enter b13915aa-c7a8-4574-8dcf-439f03929b8e.
2021-09-09T14:01:12.537Z|00002|netdev_offload_tc(revalidator13)|ERR|EC_DEBUG:
port 3
2021-09-09T14:01:12.537Z|00004|netdev_offload_tc(revalidator13)|DBG|updating
old handle: 1 prio: 2
2021-09-09T14:01:12.553Z|00005|netdev_offload_tc(revalidator13)|ERR|EC_DBG:
EXIT ok b13915aa-c7a8-4574-8dcf-439f03929b8e.
Here is a dp dump showing the wrong/old port:
$ ovs-appctl dpctl/dump-flows -m system@ovs-system type=tc
ufid:b13915aa-c7a8-4574-8dcf-439f03929b8e,
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vnet4),packet_type(ns=0/0,id=0/0),eth(src=52:54:00:88:51:38,dst=04:f4:bc:28:57:01),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no),
packets:180, bytes:15120, used:0.230s, dp:tc,
actions:sample(sample=10.0%,actions(userspace(pid=2771969963,sFlow(vid=0,pcp=0,output=2147483650),actions))),enp5s0f0
Guess this is the area you need to look at del_filter_and_ufid_mapping():
2389 if (get_ufid_tc_mapping(ufid, &id) == 0) {
2390 VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
2391 id.handle, id.prio);
2392 info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id,
ufid);
2393 }
There might also be a problem in the sgid mapping as it’s also using the uuid
as the hash, and there could be a collision during the update phase.
I’m working on some escalations so, please take a look and see if you can fix
this.
I need to restart OVS, and then send a ping, and I see the problem once, I need
to restart OVS each time to see it.
I’ll wait for v15 to see how you fixed it ;)
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev