On Tue, Jul 12, 2022 at 11:55 AM Mark Michelson <[email protected]> wrote:
>
> Hi Ihar, I have a few comments below.
>
> On 7/9/22 01:33, Ihar Hrachyshka wrote:
> > When multiple chassis are fighting for the same port (requested-chassis
> > is not set, e.g. for gateway ports), they may produce an unreasonable
> > number of chassis field updates in a very short time frame (hundreds of
> > updates in several seconds). This puts unnecessary load on OVN as well
> > as any db notification consumers trying to keep up with the barrage.
> >
> > This patch throttles port claim attempts so that they don't happen more
> > frequently than once per 0.5 seconds.
> >
> > Reported: https://bugzilla.redhat.com/show_bug.cgi?id=1974898
> > Signed-off-by: Ihar Hrachyshka <[email protected]>
> > ---
> > controller/binding.c | 119 +++++++++++++++++++++++++++++++++---
> > controller/binding.h | 8 +++
> > controller/ovn-controller.c | 48 +++++++++++++++
> > tests/ovn.at | 41 +++++++++++++
> > 4 files changed, 208 insertions(+), 8 deletions(-)
> >
> > diff --git a/controller/binding.c b/controller/binding.c
> > index 1eeca44a9..133ef1ad0 100644
> > --- a/controller/binding.c
> > +++ b/controller/binding.c
> > @@ -48,6 +48,54 @@ VLOG_DEFINE_THIS_MODULE(binding);
> >
> > #define OVN_QOS_TYPE "linux-htb"
> >
> > +#define CLAIM_TIME_THRESHOLD_MS 500
> > +
> > +struct claimed_port {
> > + long long int last_claimed;
> > +};
> > +
> > +static struct shash _claimed_ports = SHASH_INITIALIZER(&_claimed_ports);
>
> One thing I noticed is that claimed ports are never removed from this
> shash until ovn-controller shuts down. If ovn-controller runs for a long
> time and has many ports come and go, this shash can grow very large,
> presumably with long-irrelevant ports in it.
>
> It may be a good idea to add a periodic traversal of _claimed_ports and
> remove any claimed ports whose last_claimed is more than
> CLAIM_TIME_THRESHOLD_MS (or some multiple of CLAIM_TIME_THRESHOLD_MS to
> avoid malloc/free churn). Since you already have a failsafe in
> get_claim_timestamp() that returns 0 if the port is not in
> _claimed_ports, this would still allow the postponement mechanism to
> work as intended.
>
> I don't think you need to schedule a poll wakeup just for this. I think
> this traversal can be performed during binding_run() and
> binding_handle_port_binding_changes() and the claimed ports will
> naturally get cleaned up as ovn-controller continues to run.
>
Good idea, thanks. Adding in the next revision in a sec.
> > +static struct sset _postponed_ports = SSET_INITIALIZER(&_postponed_ports);
> > +
> > +struct sset *
> > +get_postponed_ports(void)
> > +{
> > + return &_postponed_ports;
> > +}
> > +
> > +static long long int
> > +get_claim_timestamp(const char *port_name)
> > +{
> > + struct claimed_port *cp = shash_find_data(&_claimed_ports, port_name);
> > + if (!cp) {
> > + return 0;
> > + }
> > + return cp->last_claimed;
> > +}
> > +
> > +static void
> > +register_claim_timestamp(const char *port_name, long long int t)
> > +{
> > + struct claimed_port *cp = shash_find_data(&_claimed_ports, port_name);
> > + if (!cp) {
> > + cp = xzalloc(sizeof *cp);
> > + shash_add(&_claimed_ports, port_name, cp);
> > + }
> > + cp->last_claimed = t;
> > +}
> > +
> > +void
> > +schedule_postponed_ports(void)
> > +{
> > + const char *port_name;
> > + SSET_FOR_EACH_SAFE (port_name, &_postponed_ports) {
> > + long long int t = get_claim_timestamp(port_name);
> > + if (t) {
> > + poll_timer_wait_until(t + CLAIM_TIME_THRESHOLD_MS);
> > + }
> > + }
> > +}
> > +
> > struct qos_queue {
> > struct hmap_node node;
> > uint32_t queue_id;
> > @@ -1045,6 +1093,23 @@ remove_additional_chassis(const struct
> > sbrec_port_binding *pb,
> > remove_additional_encap_for_chassis(pb, chassis_rec);
> > }
> >
> > +static bool
> > +lport_maybe_postpone(const char *port_name, long long int now,
> > + struct sset *postponed_ports)
> > +{
> > + long long int last_claimed = get_claim_timestamp(port_name);
> > + if (now - last_claimed >= CLAIM_TIME_THRESHOLD_MS) {
> > + return false;
> > + }
> > +
> > + sset_add(postponed_ports, port_name);
> > + VLOG_DBG("Postponed claim on logical port %s.", port_name);
> > +
> > + /* Schedule the next attempt. */
> > + poll_timer_wait_until(last_claimed + CLAIM_TIME_THRESHOLD_MS);
> > + return true;
> > +}
> > +
> > /* Returns false if lport is not claimed due to 'sb_readonly'.
> > * Returns true otherwise.
> > */
> > @@ -1055,14 +1120,16 @@ claim_lport(const struct sbrec_port_binding *pb,
> > const struct ovsrec_interface *iface_rec,
> > bool sb_readonly, bool notify_up,
> > struct hmap *tracked_datapaths,
> > - struct if_status_mgr *if_mgr)
> > + struct if_status_mgr *if_mgr,
> > + struct sset *postponed_ports)
> > {
> > if (!sb_readonly) {
> > claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up,
> > if_mgr);
> > }
> >
> > enum can_bind can_bind = lport_can_bind_on_this_chassis(chassis_rec,
> > pb);
> > - bool update_tracked = false;
> > + bool claimed = false;
> > + long long int now = time_msec();
> >
> > if (can_bind == CAN_BIND_AS_MAIN) {
> > if (pb->chassis != chassis_rec) {
> > @@ -1070,6 +1137,10 @@ claim_lport(const struct sbrec_port_binding *pb,
> > return false;
> > }
> >
> > + if (lport_maybe_postpone(pb->logical_port, now,
> > postponed_ports)) {
> > + return true;
> > + }
> > +
> > if (pb->chassis) {
> > VLOG_INFO("Changing chassis for lport %s from %s to %s.",
> > pb->logical_port, pb->chassis->name,
> > @@ -1086,7 +1157,7 @@ claim_lport(const struct sbrec_port_binding *pb,
> > if (is_additional_chassis(pb, chassis_rec)) {
> > remove_additional_chassis(pb, chassis_rec);
> > }
> > - update_tracked = true;
> > + claimed = true;
> > }
> > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> > if (!is_additional_chassis(pb, chassis_rec)) {
> > @@ -1094,6 +1165,10 @@ claim_lport(const struct sbrec_port_binding *pb,
> > return false;
> > }
> >
> > + if (lport_maybe_postpone(pb->logical_port, now,
> > postponed_ports)) {
> > + return true;
> > + }
> > +
> > VLOG_INFO("Claiming lport %s for this additional chassis.",
> > pb->logical_port);
> > for (size_t i = 0; i < pb->n_mac; i++) {
> > @@ -1104,12 +1179,16 @@ claim_lport(const struct sbrec_port_binding *pb,
> > if (pb->chassis == chassis_rec) {
> > sbrec_port_binding_set_chassis(pb, NULL);
> > }
> > - update_tracked = true;
> > + claimed = true;
> > }
> > }
> >
> > - if (update_tracked && tracked_datapaths) {
> > - update_lport_tracking(pb, tracked_datapaths, true);
> > + if (claimed) {
> > + if (tracked_datapaths) {
> > + update_lport_tracking(pb, tracked_datapaths, true);
> > + }
> > + register_claim_timestamp(pb->logical_port, now);
> > + sset_find_and_delete(postponed_ports, pb->logical_port);
> > }
> >
> > /* Check if the port encap binding, if any, has changed */
> > @@ -1271,7 +1350,8 @@ consider_vif_lport_(const struct sbrec_port_binding
> > *pb,
> > b_lport->lbinding->iface,
> > !b_ctx_in->ovnsb_idl_txn,
> > !parent_pb, b_ctx_out->tracked_dp_bindings,
> > - b_ctx_out->if_mgr)){
> > + b_ctx_out->if_mgr,
> > + b_ctx_out->postponed_ports)) {
> > return false;
> > }
> >
> > @@ -1567,7 +1647,8 @@ consider_nonvif_lport_(const struct
> > sbrec_port_binding *pb,
> > return claim_lport(pb, NULL, b_ctx_in->chassis_rec, NULL,
> > !b_ctx_in->ovnsb_idl_txn, false,
> > b_ctx_out->tracked_dp_bindings,
> > - b_ctx_out->if_mgr);
> > + b_ctx_out->if_mgr,
> > + b_ctx_out->postponed_ports);
> > }
> >
> > if (pb->chassis == b_ctx_in->chassis_rec ||
> > @@ -2772,6 +2853,21 @@ delete_done:
> > }
> > }
> >
> > + /* Also handle any postponed (throttled) ports. */
> > + const char *port_name;
> > + SSET_FOR_EACH_SAFE (port_name, b_ctx_out->postponed_ports) {
> > + pb = lport_lookup_by_name(b_ctx_in->sbrec_port_binding_by_name,
> > + port_name);
> > + if (!pb) {
> > + sset_find_and_delete(b_ctx_out->postponed_ports, port_name);
> > + continue;
> > + }
> > + handled = handle_updated_port(b_ctx_in, b_ctx_out, pb,
> > qos_map_ptr);
> > + if (!handled) {
> > + break;
> > + }
> > + }
> > +
> > if (handled && qos_map_ptr && set_noop_qos(b_ctx_in->ovs_idl_txn,
> > b_ctx_in->port_table,
> > b_ctx_in->qos_table,
> > @@ -3214,3 +3310,10 @@ ovs_iface_matches_lport_iface_id_ver(const struct
> > ovsrec_interface *iface,
> >
> > return true;
> > }
> > +
> > +void
> > +binding_destroy(void)
> > +{
> > + shash_destroy_free_data(&_claimed_ports);
> > + sset_clear(&_postponed_ports);
> > +}
> > diff --git a/controller/binding.h b/controller/binding.h
> > index 1fed06674..161057592 100644
> > --- a/controller/binding.h
> > +++ b/controller/binding.h
> > @@ -103,6 +103,8 @@ struct binding_ctx_out {
> > struct hmap *tracked_dp_bindings;
> >
> > struct if_status_mgr *if_mgr;
> > +
> > + struct sset *postponed_ports;
> > };
> >
> > /* Local bindings. binding.c module binds the logical port (represented by
> > @@ -219,4 +221,10 @@ struct binding_lport {
> > size_t n_port_security;
> > };
> >
> > +struct sset *get_postponed_ports(void);
> > +void schedule_postponed_ports(void);
> > +
> > +/* Clean up module state. */
> > +void binding_destroy(void);
> > +
> > #endif /* controller/binding.h */
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 69615308e..851f53ce0 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -1175,6 +1175,42 @@ en_activated_ports_run(struct engine_node *node,
> > void *data_)
> > engine_set_node_state(node, state);
> > }
> >
> > +struct ed_type_postponed_ports {
> > + struct sset *postponed_ports;
> > +};
> > +
> > +static void *
> > +en_postponed_ports_init(struct engine_node *node OVS_UNUSED,
> > + struct engine_arg *arg OVS_UNUSED)
> > +{
> > + struct ed_type_postponed_ports *data = xzalloc(sizeof *data);
> > + data->postponed_ports = get_postponed_ports();
> > + return data;
> > +}
> > +
> > +static void
> > +en_postponed_ports_cleanup(void *data_)
> > +{
> > + struct ed_type_postponed_ports *data = data_;
> > + if (!data->postponed_ports) {
> > + return;
> > + }
> > + data->postponed_ports = NULL;
> > +}
> > +
> > +static void
> > +en_postponed_ports_run(struct engine_node *node, void *data_)
> > +{
> > + struct ed_type_postponed_ports *data = data_;
> > + enum engine_node_state state = EN_UNCHANGED;
> > + data->postponed_ports = get_postponed_ports();
> > + if (!sset_is_empty(data->postponed_ports)) {
> > + schedule_postponed_ports();
> > + state = EN_UPDATED;
> > + }
> > + engine_set_node_state(node, state);
> > +}
> > +
> > struct ed_type_runtime_data {
> > /* Contains "struct local_datapath" nodes. */
> > struct hmap local_datapaths;
> > @@ -1205,6 +1241,8 @@ struct ed_type_runtime_data {
> >
> > struct shash local_active_ports_ipv6_pd;
> > struct shash local_active_ports_ras;
> > +
> > + struct sset *postponed_ports;
> > };
> >
> > /* struct ed_type_runtime_data has the below members for tracking the
> > @@ -1405,6 +1443,7 @@ init_binding_ctx(struct engine_node *node,
> > b_ctx_out->egress_ifaces = &rt_data->egress_ifaces;
> > b_ctx_out->lbinding_data = &rt_data->lbinding_data;
> > b_ctx_out->local_iface_ids = &rt_data->local_iface_ids;
> > + b_ctx_out->postponed_ports = rt_data->postponed_ports;
> > b_ctx_out->tracked_dp_bindings = NULL;
> > b_ctx_out->if_mgr = ctrl_ctx->if_mgr;
> > }
> > @@ -1442,6 +1481,10 @@ en_runtime_data_run(struct engine_node *node, void
> > *data)
> > local_binding_data_init(&rt_data->lbinding_data);
> > }
> >
> > + struct ed_type_postponed_ports *pp_data =
> > + engine_get_input_data("postponed_ports", node);
> > + rt_data->postponed_ports = pp_data->postponed_ports;
> > +
> > struct binding_ctx_in b_ctx_in;
> > struct binding_ctx_out b_ctx_out;
> > init_binding_ctx(node, rt_data, &b_ctx_in, &b_ctx_out);
> > @@ -3536,6 +3579,7 @@ main(int argc, char *argv[])
> > ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
> > ENGINE_NODE(ofctrl_is_connected, "ofctrl_is_connected");
> > ENGINE_NODE_WITH_CLEAR_TRACK_DATA(activated_ports, "activated_ports");
> > + ENGINE_NODE(postponed_ports, "postponed_ports");
> > ENGINE_NODE(pflow_output, "physical_flow_output");
> > ENGINE_NODE_WITH_CLEAR_TRACK_DATA(lflow_output,
> > "logical_flow_output");
> > ENGINE_NODE(flow_output, "flow_output");
> > @@ -3675,6 +3719,9 @@ main(int argc, char *argv[])
> > runtime_data_sb_datapath_binding_handler);
> > engine_add_input(&en_runtime_data, &en_sb_port_binding,
> > runtime_data_sb_port_binding_handler);
> > + /* Reuse the same handler for any previously postponed ports. */
> > + engine_add_input(&en_runtime_data, &en_postponed_ports,
> > + runtime_data_sb_port_binding_handler);
> >
> > /* The OVS interface handler for runtime_data changes MUST be executed
> > * after the sb_port_binding_handler as port_binding deletes must be
> > @@ -4312,6 +4359,7 @@ loop_done:
> > lflow_destroy();
> > ofctrl_destroy();
> > pinctrl_destroy();
> > + binding_destroy();
> > patch_destroy();
> > if_status_mgr_destroy(if_mgr);
> > shash_destroy(&vif_plug_deleted_iface_ids);
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 6e35bb5b8..3411e3a34 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -15392,6 +15392,47 @@ OVN_CLEANUP([hv1],[hv2])
> > AT_CLEANUP
> > ])
> >
> > +OVN_FOR_EACH_NORTHD([
> > +AT_SETUP([tug-of-war between two chassis for the same port])
> > +ovn_start
> > +
> > +ovn-nbctl ls-add ls0
> > +ovn-nbctl lsp-add ls0 lsp0
> > +
> > +net_add n1
> > +for i in 1 2; do
> > + sim_add hv$i
> > + as hv$i
> > + ovs-vsctl add-br br-phys
> > + ovn_attach n1 br-phys 192.168.0.$i
> > +done
> > +
> > +for i in 1 2; do
> > + as hv$i
> > + ovs-vsctl -- add-port br-int vif \
> > + -- set Interface vif external-ids:iface-id=lsp0
> > +done
> > +
> > +# give controllers some time to fight for the port binding
> > +sleep 3
> > +
> > +# calculate the number of port claims registered by each fighting chassis
> > +hv1_claims=$(as hv1 grep -c 'Claiming\|Changing chassis'
> > hv1/ovn-controller.log)
> > +hv2_claims=$(as hv2 grep -c 'Claiming\|Changing chassis'
> > hv2/ovn-controller.log)
> > +
> > +echo $hv1_claims
> > +echo $hv2_claims
>
> Nit: For debuggability purposes, these echos should probably have some
> more context, like:
>
> echo "hv1 claimed ${hv1_claims} times"
>
> This makes it a bit easier to read the testsuite logs instead of just
> seeing an arbitrary "7" or whatever.
>
Huh. This was actually a debug message that I haven't meant to push
but now that it's there, I'll keep it by adopting your suggestion. :)
> > +
> > +# check that neither registered an outrageous number of port claims
> > +max_claims=10
> > +AT_CHECK([test "${hv1_claims}" -le "${max_claims}"], [0], [])
> > +AT_CHECK([test "${hv2_claims}" -le "${max_claims}"], [0], [])
> > +
> > +OVN_CLEANUP([hv1],[hv2])
> > +
> > +AT_CLEANUP
> > +])
> > +
> > OVN_FOR_EACH_NORTHD([
> > AT_SETUP([options:requested-chassis with hostname])
> >
> >
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev