On Mon, Nov 4, 2024 at 3:47 AM Ales Musil <[email protected]> wrote:
> > > On Wed, Oct 30, 2024 at 4:11 PM Tiago Pires <[email protected]> > wrote: > >> Hey Ales, >> >> Do you know if there is a way to reproduce the huge OF message that >> ovn-controller sends to ovs-vswitchd? >> I have a similar issue that this patch solve but I was not be able to >> reproduce it: >> >> #ovn-controller >> 5378238|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) >> (xid=0xbd25708): OFPBFC_MSG_BAD_LEN >> OFPT_BUNDLE_ADD_MESSAGE (OF1.5) (xid=0xbd25708): ***decode error: >> OFPBFC_MSG_BAD_LEN*** >> 00000000 06 22 00 10 0b d2 57 08-00 7a d1 b1 00 00 00 03 >> |."....W..z......| >> 5378239|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: disconnecting >> >> #OVS >> 1580559|connmgr|INFO|br-int<->unix#2544073: sending OFPBMC_BAD_LEN >> error reply to OFPT_BUNDLE_ADD_MESSAGE message >> >> Thanks >> >> Tiago Pires >> > > > Hi Tiago, > > AFAIR this was happening with a large number of LBs that > had "hairpin_snat_ip" set. For reproduction I was doing some > manual tinkering, unfortunately I don't remember the details. > I hope this helps at least a little bit. > > Best regards, > Ales > Hi Ales, I saw this issue also but I could not reproduce. We will apply the patch and see if the symptom will disappear. Thank you Tiago Pires > > >> >> On Thu, Sep 14, 2023 at 5:41 PM Mark Michelson <[email protected]> >> wrote: >> > >> > Thanks for the updates, Ales. >> > >> > I pushed this change to main and all branches back to 22.03. >> > >> > On 9/14/23 10:13, Ales Musil wrote: >> > > If the flow size is bigger than UINT16_MAX it doesn't >> > > fit into openflow message. Programming of such flow will >> > > fail which results in disconnect of ofctrl. After reconnect >> > > we program everything from scratch, in case the long flow still >> > > remains the cycle continues. This causes the node to be almost >> > > unusable as there will be massive traffic disruptions. >> > > >> > > To prevent that check if the flow is within the allowed size. >> > > If not log the flow that would trigger this problem and do not program >> > > it. This isn't a self-healing process, but the chance of this >> happening >> > > are very slim. Also, any flow that is bigger than allowed size is OVN >> > > bug, and it should be fixed. >> > > >> > > Reported-at: https://bugzilla.redhat.com/1955167 >> > > Signed-off-by: Ales Musil <[email protected]> >> > > --- >> > > v2: Fix the formatting error. >> > > v3: Rebase on top of current main. >> > > Add coverage counter for the too long error. >> > > --- >> > > controller/ofctrl.c | 45 >> +++++++++++++++++++++++++++++++++++++++++---- >> > > 1 file changed, 41 insertions(+), 4 deletions(-) >> > > >> > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c >> > > index 64a444ff6..a8cda4fe8 100644 >> > > --- a/controller/ofctrl.c >> > > +++ b/controller/ofctrl.c >> > > @@ -16,6 +16,7 @@ >> > > #include <config.h> >> > > #include "bitmap.h" >> > > #include "byte-order.h" >> > > +#include "coverage.h" >> > > #include "dirs.h" >> > > #include "dp-packet.h" >> > > #include "flow.h" >> > > @@ -55,6 +56,8 @@ >> > > >> > > VLOG_DEFINE_THIS_MODULE(ofctrl); >> > > >> > > +COVERAGE_DEFINE(ofctrl_msg_too_long); >> > > + >> > > /* An OpenFlow flow. */ >> > > struct ovn_flow { >> > > /* Key. */ >> > > @@ -1769,6 +1772,18 @@ ovn_flow_log(const struct ovn_flow *f, const >> char *action) >> > > } >> > > } >> > > >> > > +static void >> > > +ovn_flow_log_size_err(const struct ovn_flow *f) >> > > +{ >> > > + COVERAGE_INC(ofctrl_msg_too_long); >> > > + >> > > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); >> > > + >> > > + char *s = ovn_flow_to_string(f); >> > > + VLOG_ERR_RL(&rl, "The FLOW_MOD message is too big: %s", s); >> > > + free(s); >> > > +} >> > > + >> > > static void >> > > ovn_flow_uninit(struct ovn_flow *f) >> > > { >> > > @@ -1888,15 +1903,27 @@ encode_bundle_add(struct ofpbuf *msg, struct >> ofputil_bundle_ctrl_msg *bc) >> > > return ofputil_encode_bundle_add(OFP15_VERSION, &bam); >> > > } >> > > >> > > -static void >> > > +static bool >> > > add_flow_mod(struct ofputil_flow_mod *fm, >> > > struct ofputil_bundle_ctrl_msg *bc, >> > > struct ovs_list *msgs) >> > > { >> > > struct ofpbuf *msg = encode_flow_mod(fm); >> > > struct ofpbuf *bundle_msg = encode_bundle_add(msg, bc); >> > > + >> > > + uint32_t flow_mod_len = msg->size; >> > > + uint32_t bundle_len = bundle_msg->size; >> > > + >> > > ofpbuf_delete(msg); >> > > + >> > > + if (flow_mod_len > UINT16_MAX || bundle_len > UINT16_MAX) { >> > > + ofpbuf_delete(bundle_msg); >> > > + >> > > + return false; >> > > + } >> > > + >> > > ovs_list_push_back(msgs, &bundle_msg->list_node); >> > > + return true; >> > > } >> > > >> > > /* group_table. */ >> > > @@ -2235,7 +2262,10 @@ installed_flow_add(struct ovn_flow *d, >> > > .new_cookie = htonll(d->cookie), >> > > .command = OFPFC_ADD, >> > > }; >> > > - add_flow_mod(&fm, bc, msgs); >> > > + >> > > + if (!add_flow_mod(&fm, bc, msgs)) { >> > > + ovn_flow_log_size_err(d); >> > > + } >> > > } >> > > >> > > static void >> > > @@ -2259,7 +2289,7 @@ installed_flow_mod(struct ovn_flow *i, struct >> ovn_flow *d, >> > > /* Use OFPFC_ADD so that cookie can be updated. */ >> > > fm.command = OFPFC_ADD; >> > > } >> > > - add_flow_mod(&fm, bc, msgs); >> > > + bool result = add_flow_mod(&fm, bc, msgs); >> > > >> > > /* Replace 'i''s actions and cookie by 'd''s. */ >> > > mem_stats.installed_flow_usage -= i->ofpacts_len - >> d->ofpacts_len; >> > > @@ -2267,6 +2297,10 @@ installed_flow_mod(struct ovn_flow *i, struct >> ovn_flow *d, >> > > i->ofpacts = xmemdup(d->ofpacts, d->ofpacts_len); >> > > i->ofpacts_len = d->ofpacts_len; >> > > i->cookie = d->cookie; >> > > + >> > > + if (!result) { >> > > + ovn_flow_log_size_err(i); >> > > + } >> > > } >> > > >> > > static void >> > > @@ -2280,7 +2314,10 @@ installed_flow_del(struct ovn_flow *i, >> > > .table_id = i->table_id, >> > > .command = OFPFC_DELETE_STRICT, >> > > }; >> > > - add_flow_mod(&fm, bc, msgs); >> > > + >> > > + if (!add_flow_mod(&fm, bc, msgs)) { >> > > + ovn_flow_log_size_err(i); >> > > + } >> > > } >> > > >> > > static void >> > >> > _______________________________________________ >> > dev mailing list >> > [email protected] >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> -- >> >> >> >> >> _‘Esta mensagem é direcionada apenas para os endereços constantes no >> cabeçalho inicial. Se você não está listado nos endereços constantes no >> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa >> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas >> estão >> imediatamente anuladas e proibidas’._ >> >> >> * **‘Apesar do Magazine Luiza tomar >> todas as precauções razoáveis para assegurar que nenhum vírus esteja >> presente nesse e-mail, a empresa não poderá aceitar a responsabilidade >> por >> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* >> >> >> >> > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > [email protected] > <https://red.ht/sig> > -- _‘Esta mensagem é direcionada apenas para os endereços constantes no cabeçalho inicial. Se você não está listado nos endereços constantes no cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão imediatamente anuladas e proibidas’._ * **‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
