On 11/29/23 16:45, Xavier Simonart wrote:
> The group_table and meter_table are initialized in ovn-controller, with n_ids 
> = 0.
> Then they are re-initialized in ofctrl, with proper number of n_ids, in state 
> S_CLEAR_FLOWS.
> However, nothing prevented to start adding flows (by adding logical ports) 
> using groups
> before ofctrl reaches this state. This was causing some wrong flows (missing 
> group in the flow).
> 
> With this patch, as soon as the feature rconn is available, i.e. before 
> adding flows, those table
> are properly re-initialized.
> 
> This issue is usually not visible in main and recent branches ci, since
> "Wait for new ovn-controllers to connect to Southbound." as this was slowing 
> down the moment when
> the test started to add ports.
> This was causing the following test to fail:
> "ECMP static routes -- ovn-northd -- parallelization=yes -- 
> ovn_monitor_all=yes".
> 
> Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and meter 
> IDs to 16bit.")
> Signed-off-by: Xavier Simonart <xsimo...@redhat.com>
> ---

Thanks for the patch, Xavier!  And sorry for introducing the issue in
the first place.

>  controller/ovn-controller.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 44605eb4e..98d95b1f3 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -5654,6 +5654,14 @@ main(int argc, char *argv[])
>                                             br_int ? br_int->name : NULL)) {
>                  VLOG_INFO("OVS feature set changed, force recompute.");
>                  engine_set_force_recompute(true);
> +                if (ovs_feature_set_discovered()) {
> +                    lflow_output_data =
> +                        engine_get_internal_data(&en_lflow_output);
> +                    ovn_extend_table_reinit(&lflow_output_data->group_table,
> +                                        ovs_feature_max_select_groups_get());
> +                    ovn_extend_table_reinit(&lflow_output_data->meter_table,
> +                                            ovs_feature_max_meters_get());

I don't think this is correct.  What if the supported number of
groups/meters actually changed, from X to Y, with X != 0?  In that case
ovs_feature_support_run() would also return true and we'll reinitialize
the "extend_tables" (call ovn_extend_table_reinit()).  That currently
just recreates the id_pool but doesn't touch the
"ovn_extend_table->existing" map.

Doesn't that mean that we'd potentially end up in an inconsistent state?

Funny thing, in the v3 of the original patch that introduced the problem
I was calling ovn_extend_table_clear() from ovn_extend_table_reinit()
but we decided to remove it as it was already done by the caller. :)

https://patchwork.ozlabs.org/project/ovn/patch/20231026113713.1718954-1-dce...@redhat.com/#3209437

Thinking more about it, I wonder, isn't it enough to change the way this
works and just call ovn_extend_table_reinit() here instead of in
run_S_CLEAR_FLOWS()?  We should also clear the "existing" map inside
ovn_extend_table_reinit() if we're changing the size of the id_pool.

Wpuldn't that properly cover all scenarios?

> +                }
>              }
>  
>              if (br_int && ovs_feature_set_discovered()) {

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to