On Tue, Jun 7, 2022 at 12:16 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 5/31/22 23:48, Ilya Maximets wrote:
> > On 5/31/22 21:15, Frode Nordahl wrote:
> >> On Mon, May 30, 2022 at 5:25 PM Frode Nordahl
>
> <snip>
>
> >> I've pushed the first part of the fix here:
> >> https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/394450.html
> >
> > Thanks!  I saw that and I tend to think that it is correct.
> > I'll try to test it and apply in the next couple of days.
> >
> > One question about the test above: which entity actually adds
> > the ct_state to the packet or at which moment that happens?
> > I see it, but I'm not sure I fully understand that.  Looks
> > like I'm missing smething obvious.
>
> I found what is going on there - kernel simply tracks everything
> if not told to do so, so ICMP packets create the ct entry and
> subsequent packets re-use it, so icmp replies have +trk set while
> entering OVS.

Great, my hunch was that something along these lines was happening as
well, I have to admit the test case was found by locating something
closest to the real life use case and it proved to work as a good test
for this condition.

> ----
>
> Let's have some summary of the issues discovered here so far,
> including a few new issues:
>
> 1. ct states set externally are not tracked correctly by OVS.
>    Fix: https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/394450.html
>    Status:  LGTM, will apply soon.
>    This fixes the problem originally reported by Liam, IIUC.  Right?

Yes.

> 2. Kernel ct() actions are trying to re-use the cached connection
>    after the tuple changes.
>    This ends up to be the OVN hairpin issue as reported here:
>      https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
>
>    Proposed Fix:
>      
> https://lore.kernel.org/netdev/20220606221140.488984-1-i.maxim...@ovn.org/T/#u
>
>    Status: Needs review.

I can confirm that the proposed fix resolves the OVN hairpin issue. It
also looks simple enough to be backportable all the way to where we
would need it (kernel 5.4.0). I'll have a look at giving this wider
exposure in an internal CI environment as a canary for any unintended
consequences if that would be helpful.

> 3. ct_clear is not a reversible action, because it required
>    to pass the packet through conntrack again in order to restore
>    the conntrack state.
>
>    Fix:     To move the CT_CLEAR to a group of irreversible actions
>             in the reversible_actions() in ofproto/ofproto-dpif-xlate.c.
>
>    Status: An easy fix that does nothing useful without fixes for
>            later issues, because we clear the ct_state before
>            the patch_port_output() processing and optimizing out
>            the CT_CLEAR action.
>
>
> 4. (new!) reversible_actions() optimization is logically incorrect.
>    The reason is that reversible_actions() doesn't look further
>    than a list of actions of the first OF rule in the second bridge.
>    If the list of action contains resubmit, there can still be
>    irreversible actions and packet will be irreversibly modified by
>    the patch port processing without cloning.
>
>    Simple reproducer:
>
> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
> index dbb3b6dda..be30ad5bf 100644
> --- a/tests/ofproto-dpif.at
> +++ b/tests/ofproto-dpif.at
> @@ -8736,7 +8736,8 @@ OVS_VSWITCHD_START(
>  AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br1 'meter=1 pktps stats 
> bands=type=drop rate=2'])
>  AT_CHECK([ovs-ofctl del-flows br0])
>  AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 in_port=local,ip,actions=2,1])
> -AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br1 
> in_port=1,ip,actions=meter:1,3])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br1 
> "in_port=1,ip,actions=resubmit(,7)"])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br1 
> table=7,in_port=1,ip,actions=meter:1,3])
>
>  AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(100),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=10.1.1.22,dst=10.0.0.3,proto=6,tos=0,ttl=64,frag=no),tcp(src=53295,dst=8080)'],
>  [0], [stdout])
>  AT_CHECK([tail -1 stdout], [0],
> ---
>
>    Status: unclear.
>    One idea is to reverse the logic and check datapath actions for
>    being reversible when going up in recursion instead of trying to
>    predict all the future actions while going down.
>
>    Ideas are welcome.
>    Hammerhead approach: Mark 'resubmit' as irreversible action.  But that
>    will effectively mean that we're always cloning on patch port output,
>    which is not great, see the issue #8.
>
>
> 5. Packets after the ct() in a non-forked pipeline must be untracked.
>    Status: unclear.
>
>    Fix for the issue #2 will cover issues for already tracked packets,
>    but if the packet is supposed to be untracked, but it is tracked,
>    then we must emit the ct_clear action from the userspace.
>
>    The most viable approach, I guess, is the previously proposed
>    'pending_ct_clear' fix.  Alternative is to always emit ct_clear
>    right after the ct() action and not loose our minds trying to
>    track the pending action in all the weird cases.
>
>
> 6. Conntrack state must not be preserved while going through the
>    patch port.
>
>    State: unclear.
>
>    The right solution would be to emit the ct_clear datapath action
>    after cloning for the patch port.  Few problems here:
>    - current code in ofproto/ofproto-dpif-xlate.c will not actually
>      allow us to do that, we need to fix the issue #4 first to be able to
>      inject new actions post factum.
>    - ct_clear is non-reversible (issue #3) meaning if we're adding
>      ct-clear, we have to clone even if the patch port processing
>      doesn't have any ct() actions or recirculations.
>      Excessive ct_clear's and clones can be avoided by fixing the
>      issue #4 and examining the list of datapath actions created by
>      the patch port processing afterwards to decide if we need to
>      clone or insert ct_clear.
>
>
> 7. (new!) Userspace conntrack implementation fails to perform correct
>    lookup if the cached connection was cleared during the tuple change.
>    The issue can be seen with the OVN hairpin issue reproducer:
>       https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6
>    by running 'make check-system-userspace'.
>
>    I'm not 100% sure what is going on there, but the test fails because
>    packets are marked as invalid due to conn_not_found() -> valid_new()
>    failure for the icmp reply.
>
>
> 8. Not really a bug, but all above implies adding more clone()'s and
>    they are not playing well with HW offloading, i.e. simply breaking it.
>    So, it makes sense to think how we can minimize the impact.

Thank you for breaking down and detailing all of the issues we have
discovered as part of this discussion (and some more!).  HW offloading
is important to us as well and OVS/OVN plays an important part in
developing the technology into something that will soon be part of
everyone's daily lives.  It's hard work to retain the full
flexibility/programmability of OVS in this world and I'm convinced it
will be worth it.

I'll have a look and contribute with any thoughts.  I also wonder if
any of the hardware guys here have opinions and want to chime in on
how we best solve these challenges?

-- 
Frode Nordahl

> Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to