On Fri, Sep 30, 2022 at 11:50 AM Eelco Chaudron <[email protected]> wrote:
> > > On 29 Sep 2022, at 18:01, Ales Musil wrote: > > > On Fri, Sep 23, 2022 at 1:18 PM Eelco Chaudron <[email protected]> > wrote: > > > >> > >> > >> On 7 Sep 2022, at 8:54, Ales Musil wrote: > >> > >>> When the packet was traveling through patch port boundary > >>> OvS would check if any of the actions is reversible, > >>> if not it would clone the packet. However, the check > >>> was only at the first level of the second bridge. > >>> That caused some issues when the packet had gone > >>> through more actions, some of them might have been > >>> irreversible. > >>> > >>> In order to keep the semantics the same we might > >>> need to run the actions twice in the worst case > >>> scenario. During the clone there are 4 scenarios > >>> that can happen. > >>> > >>> 1) The action is last one for that flow, > >>> in that case we just continue without clone. > >>> > >>> 2) There is irreversible action in the action > >>> set (first level). In this case we know that > >>> there is at leas one irreversible action which > >>> is enough to do clone. > >>> > >>> 3) All actions in first level are reversible, > >>> we can try to run all actions as if we don't > >>> need any clone and inspect the ofpbuf at the > >>> end. In positive case there are no irreversible > >>> actions so we will just submit the buffer and continue. > >>> > >>> 4) This is same as 3) with the difference that > >>> there is irreversible action in the ofpbuf. > >>> To keep the semantics we need to re-run the actions > >>> and treat it as clone. This requires resotration > >>> of the xlate_ctx. > >>> > >>> Add test cases for all irreversible actions > >>> to see if they are properly cloned. > >>> > >>> Signed-off-by: Ales Musil <[email protected]> > >> > >> Some more comments on this v5, see inline below. > >> > > > > Thank you for the review. I will address those points in v6. > > > > Thank you! > > <SNIP> > > >> > >> This is the same as an invalid action, so I think we should return > false. > >> We should probably return false on each unknown option. > >> How about the following change? > >> > > > > Applied, but I think you have meant to return true, as any unknown action > > being irreversible by default? > > Yes, you are right they should be irreversible by default. > > <SNIP> > > >>> static void > >>> -clone_xlate_actions(const struct ofpact *actions, size_t actions_len, > >>> - struct xlate_ctx *ctx, bool is_last_action, > >>> - bool group_bucket_action OVS_UNUSED) > >>> +do_clone_xlate_actions(const struct ofpact *actions, size_t > actions_len, > >>> + struct xlate_ctx *ctx) > >>> { > >>> - struct ofpbuf old_stack = ctx->stack; > >>> - union mf_subvalue new_stack[1024 / sizeof(union mf_subvalue)]; > >> > >> For the stack and action_set you no longer reserve a clean array. > >> I think this was needed to stop it from leaking, see commit “b827b231”. > >> > > > > The commit states that it is a design decision, I don't mind having it as > > before. > > It just seemed to me that this won't harm, also before that commit it was > > not "clearing" > > the stack or action_set at all, which I do by moving the size back. This > > avoids > > taking up stack space. > > But with the old implementation, the stack size is 0, so you can not > retrieve anything from the stack. With your changes, the stack has all > entries on it as before, so in theory, you could get them from the stack. > However, I do not know the details on the use case/implementation for this, > so I have no way how this will affect OpenFlow. I’ve copied on Benn who > might still know this ;) Or maybe Ilya knows… > > > Let’s wait with sending out a v6 until this is resolved, so we can do one > final review. > > <SNIP> > > >>> -dp_clone_done: > >>> - /* The clone's conntrack execution should have no effect on the > >> original > >>> - * packet. */ > >>> - ctx->conntracked = old_conntracked; > >>> + /* Let's run the actions as if they are reversible, if we find out > >> that > >>> + * there was any irreversible action we can restore the xlate_ctx > >> and run > >>> + * the do_clone instead. If the action is last we don't have to > >> check and > >>> + * can just proceed with the actions. */ > >> > >> Any stats to suggest that this is the best approach? I’m wondering if > this > >> is the common case, or if the real clone is more common. If it's the > >> latter, we might want to swap the order. > >> > >> Maybe you can do some tests with OVN and add two coverage counters, one > >> for total invocation of this function and one for real clones, and see > the > >> results while running through a real-user OVN use case? > >> > > > > I will try to come up with some OVN tests that might give a better > > perspective which is more common. For start I'll run the counters over > our > > tests in OVN. > > Thanks! > > Hi, I got the counters from OVN tests, it should represent some pieces of normal workflows. On average the actual clone is happening in 0.229%. Which suggests that the order is correct. The raw data are available at https://pastebin.mozilla.org/VqKTaO2e for 21 days. First number are calls to clone_xlate_actions second number is how much of those were actual clones. Thanks, Ales -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA <https://www.redhat.com> [email protected] IM: amusil <https://red.ht/sig> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
