Hi Ben et al.

We recently had an issue where OVS would crash as it was running out of stack space processing an OVN flow loop :) I was hoping it would jump out of the loop, but due to change, "790c5d269 ofproto-dpif: Do not count resubmit to later tables against limit." the resubmit loop can be up to 4K.

When the clone action is used (and others) the stack size increases quite drastically, some tests showed that over 19M was needed to reach the 4K limit. Even a simple resubmit to resubmit jump back and forth till 4K is reached requires a 3.5M stack size.

Some small changes, like doing malloc for mf_subvalue, and actset_stub in clone_xlate_actions() allowed the worst case to go from around 19M to 12M, but still, this is a lot of stack memory.

One idea could be that on the last action in the list try to unwind the stack (recursion) to the previous nonfinal action and then continue processing this action. I'm not too familiar with the xlate code, but it looks quite complex already, so not sure if this is an option :) Also not sure if this gives us enough relief in all the OVN scenario as they use a lot of resubmits in a single action list.

Another idea Dumitru had was to delay clone() execution until you get back to the root actionset. So when you hit a clone() action you store the state in the ctx, and then go over the list once you return (this could result in a growing list). But you do not end up processing the clones on the branch of the tree. The only problem is that this results in out of order processing of the action list, i.e. clone(resubmit(,5)), 2. Will first sent out the packet on 2 and then on the destination on the clone() action. I guess this is a blocking thing, as the OpenFlow specification specifies action lists should be executed in order.

Any other ideas on the above or on how to optimize the stack usage?

Cheers,

Eelco

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to