On 6/14/22 22:24, Ilya Maximets wrote: > On 6/14/22 16:26, Oz Shlomo wrote: >> Hi Ilya, >> >> On 6/14/2022 4:03 PM, Ilya Maximets wrote: >>> On 6/14/22 10:27, Oz Shlomo via dev wrote: >>>> >>>> >>>> On 6/8/2022 3:16 AM, Frode Nordahl wrote: >>>>> On Tue, Jun 7, 2022 at 12:16 AM Ilya Maximets <i.maxim...@ovn.org> wrote: >>>>>> >>>>>> On 5/31/22 23:48, Ilya Maximets wrote: >>>>>>> On 5/31/22 21:15, Frode Nordahl wrote: >>>>>>>> On Mon, May 30, 2022 at 5:25 PM Frode Nordahl >>>>>> >>>>>> <snip> >>>>>> >>>>>>>> I've pushed the first part of the fix here: >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2022-May%2F394450.html&data=05%7C01%7Cozsh%40nvidia.com%7C7ab5490f9c334e9d877f08da4e0652fc%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637908086346878848%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pq0CABjao2UWojg6yZut7RL%2FZEeuRou0qUVZKNYP3rQ%3D&reserved=0 >>>>>>> >>>>>>> Thanks! I saw that and I tend to think that it is correct. >>>>>>> I'll try to test it and apply in the next couple of days. >>>>>>> >>>>>>> One question about the test above: which entity actually adds >>>>>>> the ct_state to the packet or at which moment that happens? >>>>>>> I see it, but I'm not sure I fully understand that. Looks >>>>>>> like I'm missing smething obvious. >>>>>> >>>>>> I found what is going on there - kernel simply tracks everything >>>>>> if not told to do so, so ICMP packets create the ct entry and >>>>>> subsequent packets re-use it, so icmp replies have +trk set while >>>>>> entering OVS. >>>>> >>>>> Great, my hunch was that something along these lines was happening as >>>>> well, I have to admit the test case was found by locating something >>>>> closest to the real life use case and it proved to work as a good test >>>>> for this condition. >>>>> >>>>>> ---- >>>>>> >>>>>> Let's have some summary of the issues discovered here so far, >>>>>> including a few new issues: >>>>>> >>>>>> 1. ct states set externally are not tracked correctly by OVS. >>>>>> Fix: >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2022-May%2F394450.html&data=05%7C01%7Cozsh%40nvidia.com%7C7ab5490f9c334e9d877f08da4e0652fc%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637908086346878848%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pq0CABjao2UWojg6yZut7RL%2FZEeuRou0qUVZKNYP3rQ%3D&reserved=0 >>>>>> Status: LGTM, will apply soon. >>>>>> This fixes the problem originally reported by Liam, IIUC. Right? >>>>> >>>>> Yes. >>>>> >>>>>> 2. Kernel ct() actions are trying to re-use the cached connection >>>>>> after the tuple changes. >>>>>> This ends up to be the OVN hairpin issue as reported here: >>>>>> >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fubuntu%2F%2Bsource%2Fovn%2F%2Bbug%2F1967856&data=05%7C01%7Cozsh%40nvidia.com%7C7ab5490f9c334e9d877f08da4e0652fc%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637908086346878848%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=25B7VbtRFguupC7VoNjZK%2FWlasu%2BMSTUzJkszvEpDaQ%3D&reserved=0 >>>>>> >>>>>> Proposed Fix: >>>>>> >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fnetdev%2F20220606221140.488984-1-i.maximets%40ovn.org%2FT%2F%23u&data=05%7C01%7Cozsh%40nvidia.com%7C7ab5490f9c334e9d877f08da4e0652fc%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637908086346878848%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=areRLYsAEbare7yo%2FxmIF9k2tMw2v8ZQkwHcR%2FEvV%2Bo%3D&reserved=0 >>>>>> >>>>>> Status: Needs review. >>>>> >>>>> I can confirm that the proposed fix resolves the OVN hairpin issue. It >>>>> also looks simple enough to be backportable all the way to where we >>>>> would need it (kernel 5.4.0). I'll have a look at giving this wider >>>>> exposure in an internal CI environment as a canary for any unintended >>>>> consequences if that would be helpful. >>>>> >>>> >>>> Sorry for jumping in late on this, as this patch was already accepted to >>>> the kernel. >>>> However, I have a concern that this patch would break the tc datapath when >>>> ovs hw offload is enabled. >>>> >>>> IIUC then this patch adds an implicit ovs_ct_clear call for 5-tuple modify >>>> actions. However, this implicit action will not apply to flows that use >>>> the tc datapath. >>>> >>>> Forde, can you verify that indeed this fix breaks the OVN hairpin use case >>>> when hw offload is enabled. >>> Hi, Oz. I don't think that this kernel fix breaks the TC datapath >>> as the packets processed by the openvswitch kernel module will not >>> go back to TC for further processing, IIUC. Also, it's not a full >>> ct_clear, because we're not clearing the flow key. >> >> A flow datapath is either in tc or in ovs. >> If hardware offload is enabled then ovs will create a tc flower entry. >> Therefore, packets for that flow will be processed by tc and not openvswitch. >> Note that hardware offload may be enabled even if there is no supporting >> hardware. TC software datapath is designed to be functionally equivalent to >> ovs. >> >> tc is processed before openvswitch in the kernel pipeline. Therefore, if a >> packet is matched by tc then it will not continue to openvswitch. Therefore >> my concern is that openvswitch change will not apply if ovs hardware offload >> is enabled. >> >>> >>> But I agree that the original bug exists in TC as well, since TC >>> just copied the ct() recircuation optimization from the openvswitch. >>> So, if there are subsequent ct actions with pedit in between, >>> TC will have the same problem with misclassification as OVS had >>> before the kernel fix 2061ecfdf235. >> >> Right. >> >>> >>> So, the similar fix should be implemented for TC as well. However, >>> I'm not sure how to actually do that, because ct and pedit are >>> not really connected in the kernel. The issue might be fixed as a >>> side effect from fixes for the issue #5 in the list here, I guess, >>> but it's not really a correct fix. The reason why it should be >>> fixed in the kernel is because user doesn't really know that TC >>> or openvswitch module cached that connection, the user didn't ask >>> it to be cached and re-used, they only wanted to populate the >>> current flow key with the ct_{state,mark,label} or commit some >>> changes. TC/openvswitch kernel module decided to cache the nfct, >>> so it should handle possible mismatch if the packet got changed. >>> >>> Does that make sense? >> >> Indeed changing the tc pedit action is not a possibility. >> >> We did copy the caching optimization from ovs when implementing tc act_ct. >> >> I wonder if we could remove the optimization. >> According to the comment in the code the caching mechanism was designed >> to optimize the ct(commit) execution, as ovs connections have to be >> explicitly >> commited. >> Perhaps we can also consider the other approach that you suggested, verifying >> that the cached 5-tuples was not changed. >> >> However, I do remember that OVN pod to external pipeline actually relies on >> this optimization when executing ct(nat) -> recirc -> ct for identical zones. >> Without the optimization the second ct would miss because the natted entry >> was >> never commited to the ct table. > > Hmm. This is worrying. I would not expect removal of the optimization > to affect the correctness. It should only matter performance-wise. > > Numan, Dumitru, can you comment on this? Does OVN really expect the natted > packet to not miss the ct lookup even though it was never committed?
As far as I know, since https://github.com/ovn-org/ovn/commit/0038579d192802fff03c3594e4f85dab4f7af2bd OVN will not do two subsequent CT lookups in the same zone. > > <snip the rest of the issues> > > Best regards, Ilya Maximets. > Regards, Dumitru _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss