Hi Ales, Thank you for the review i addressed all your comments in v2. regarding the test actually, it's a bit hard to test that using a unit test because we need to let ovn-sb ignore the first binding request which is not applicable using a unit test, i was testing that manually by dropping the connection to the SB that will drop the first bind request but it a bit hard to do that i an unit test.
On Fri, Feb 2, 2024 at 1:21 PM Ales Musil <amu...@redhat.com> wrote: > > > On Fri, Feb 2, 2024 at 12:19 PM Ales Musil <amu...@redhat.com> wrote: > >> >> >> On Tue, Jan 30, 2024 at 2:59 PM Mohammad Heib <mh...@redhat.com> wrote: >> >>> ovn-controller immediately removes the vport_bindings requests that were >>> generated by VIFs after handling them locally, this approach is intended >>> to avoid binding the vport to one VIF only and allocate the vport >>> between the different VIFs that exist in the vport:virtual-parents. >>> >>> Although the behavior mentioned above is correct, in some cases when the >>> SB Database is busy the transaction that binds this vport to the desired >>> VIF/chassis can fail and the controller will not re-try to bind the >>> vport again because we deleted the bind_vport request in the previous >>> loop/TXN. >>> >>> This patch aims to change the above behavior by storing the bind_vport >>> requests for a bit longer time and this is done by the following: >>> 1. add relevancy_time for each new bind_vport request and >>> mark this request as new. >>> >>> 2. loop0: ovn-controller will try to handle this bind_vport request >>> for the first time as usual (no change). >>> >>> 3. loop0: ovn-controller will try to delete the already handled >>> bind_vport >>> request as usual but first, it will check if this request is >>> marked as new and >>> if the relevancy_time is still valid if so the controller will >>> mark this >>> request as an old request and keep it, otherwise remove it. >>> >>> 4.loop1: ovn-controller will try to commit the same change again for >>> the old request, if the previous commit in loop0 succeeded the >>> change will not have any effect on SB, otherwise we will try to >>> commit the same vport_bind request again. >>> >>> 5. loop1: delete the old bind_vport request. >>> >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1954659 >>> Signed-off-by: Mohammad Heib <mh...@redhat.com> >>> --- >>> >> >> Hi Mohammad, >> >> overall the change makes sense, I have a couple of comments see down >> below. >> >> controller/pinctrl.c | 58 +++++++++++++++++++++++++++++++++++++++----- >>> 1 file changed, 52 insertions(+), 6 deletions(-) >>> >>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c >>> index bd3bd3d81..152962448 100644 >>> --- a/controller/pinctrl.c >>> +++ b/controller/pinctrl.c >>> @@ -6519,10 +6519,52 @@ struct put_vport_binding { >>> uint32_t vport_key; >>> >>> uint32_t vport_parent_key; >>> + >>> + /* This vport record Only relevant if "relevancy_time" >>> + * is earlier than the current_time, "new_record" is true. >>> + */ >>> + long long int relevancy_time; >>> >> >> The intention of the variable should be probably clearer e.g. >> relevant_until_ms. >> >> Also reading through the rest of the code it doesn't seem possible that >> the binding wouldn't be deleted, hence I think there isn't any need for the >> relevancy time, it should be enough to have a flag that will be flipped. In >> any case we will try to commit twice. I would leave out the whole relevancy >> and keep the flag flipping it on first commit WDYT? >> >> >>> + bool new_record; >>> }; >>> >>> /* Contains "struct put_vport_binding"s. */ >>> static struct hmap put_vport_bindings; >>> +/* the relevance time in ms of vport record before deleteing. */ >>> +#define VPORT_RELEVANCE_TIME 1500 >>> + >>> +/* >>> + * Validate if the vport_binding record that was added >>> + * by the pinctrl thread is still relevant and needs >>> + * to be updated in the SBDB or not. >>> + * >>> + * vport_binding record is only relevant and needs to be updated in SB >>> if: >>> + * 1. The put_vport_binding:relevancy_time still valid. >>> + * 2. The put_vport_binding:new_record is true: >>> + * The new_record will be set to "true" when this vport record is >>> created >>> + * by function "pinctrl_handle_bind_vport". >>> + * >>> + * After the first attempt to bind this vport to the chassis and >>> + * virtual_parent by function "run_put_vport_bindings" we will >>> set the >>> + * value of vpb:new_record to "false" and keep it in >>> "put_vport_bindings" >>> + * >>> + * After the second attempt of binding the vpb it will be removed >>> by >>> + * this function. >>> + * >>> + * The above guarantees that we will try to bind the vport twice >>> in >>> + * a certain amount of time. >>> + * >>> +*/ >>> +static bool >>> +is_vport_binding_relevant(struct put_vport_binding *vpb) >>> +{ >>> + long long int cur_time = time_msec(); >>> + >>> + if (vpb->new_record && vpb->relevancy_time > cur_time) { >>> + vpb->new_record = false; >>> >> >> This is a nasty side effect that I wouldn't expect from starting with >> is_. >> >> >>> + return true; >>> + } >>> + return false; >>> +} >>> >>> static void >>> init_put_vport_bindings(void) >>> @@ -6531,18 +6573,21 @@ init_put_vport_bindings(void) >>> } >>> >>> static void >>> -flush_put_vport_bindings(void) >>> +flush_put_vport_bindings(bool force_flush) >>> { >>> struct put_vport_binding *vport_b; >>> - HMAP_FOR_EACH_POP (vport_b, hmap_node, &put_vport_bindings) { >>> - free(vport_b); >>> + HMAP_FOR_EACH_SAFE (vport_b, hmap_node, &put_vport_bindings) { >>> + if (!is_vport_binding_relevant(vport_b) || force_flush) { >>> + hmap_remove(&put_vport_bindings, &vport_b->hmap_node); >>> + free(vport_b); >>> + } >>> } >>> } >>> >>> static void >>> destroy_put_vport_bindings(void) >>> { >>> - flush_put_vport_bindings(); >>> + flush_put_vport_bindings(true); >>> hmap_destroy(&put_vport_bindings); >>> } >>> >>> @@ -6620,7 +6665,7 @@ run_put_vport_bindings(struct ovsdb_idl_txn >>> *ovnsb_idl_txn, >>> sbrec_port_binding_by_key, chassis, vpb); >>> } >>> >>> - flush_put_vport_bindings(); >>> + flush_put_vport_bindings(false); >>> } >>> >>> /* Called with in the pinctrl_handler thread context. */ >>> @@ -6658,7 +6703,8 @@ pinctrl_handle_bind_vport( >>> vpb->dp_key = dp_key; >>> vpb->vport_key = vport_key; >>> vpb->vport_parent_key = vport_parent_key; >>> - >>> + vpb->new_record = true; >>> + vpb->relevancy_time = time_msec() + VPORT_RELEVANCE_TIME; >>> notify_pinctrl_main(); >>> } >>> >>> -- >>> 2.34.3 >>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >>> >> Thanks, >> Ales >> -- >> >> Ales Musil >> >> Senior Software Engineer - OVN Core >> >> Red Hat EMEA <https://www.redhat.com> >> >> amu...@redhat.com >> <https://red.ht/sig> >> > > I forgot to mention, is it possible to come up with a test that would > check this behavior? > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > amu...@redhat.com > <https://red.ht/sig> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev