On 12/13/22 17:02, Dumitru Ceara wrote:
> On 12/13/22 14:56, Ilya Maximets wrote:
>> On 12/13/22 13:21, Dumitru Ceara wrote:
>>> On 12/13/22 11:57, Ilya Maximets wrote:
>>>> On 12/12/22 20:42, Ilya Maximets wrote:
>>>>> On 12/8/22 16:21, Dumitru Ceara wrote:
>>>>>> Applications request specific monitor conditions via the
>>>>>> ovsdb_cs_set_condition() API. This returns the condition sequence
>>>>>> number at which the client can assume that the server acknowledged
>>>>>> and applied the new monitor condition.
>>>>>>
>>>>>> A corner case is when the client's first request is to set the condition
>>>>>> to a value that's equivalent to the default one ("<true>"). In such
>>>>>> cases, because it's requested explicitly by the client, we now
>>>>>> explicitly send the monitor_cond_change request to the server. This
>>>>>> avoids cases in which the expected condition sequence number and the
>>>>>> acked condition sequence number get out of sync.
>>>>>
>>>>> Sorry, I think I'm lost again. What exactly we're trying to fix here?
>>>>> [True] is a default condition if no other conditions are supplied for
>>>>> the table. Why do we need to send it explicitly?
>>>>>
>>>
>>> The problem is that there's a discrepancy between the returned expected
>>> condition seqno and what happens if a condition is explicitly set to
>>> [True] before vs after the initial connection happened.
>>>
>>>>> At first I thought that newly created condition from
>>>>> ovsdb_cs_db_get_table()
>>>>> has to be sent out, but... if it is the first time we're setting
>>>>> conditions for that table, default condition should be [True], hence
>>>>> equal to the condition we're trying to set.
>>>>>
>>>>>
>>>>> The test case is a bit synthetic as it doesn't seem to test the issue
>>>>> that exists in OVN. The fact that python IDL passes the test without
>>>>> changes also doesn't make a lot of sense to me.
>>>>>
>>>
>>> Actually it should test exactly what's happenning in OVN. That is:
>>>
>>> - connect to database, no explicit monitor condition set
>>> - get initial contents
>>> - explicitly set monitor condition to [True] and get seqno S at
>>> which the condition should be "acked".
>>> - expect that ovsdb_idl_get_condition_seqno() eventually reaches S
>>>
>>> The Python IDL behaves differently unfortunately, I'll give more details
>>> below.
>>>
>>>>> Could you provide some more data on what exactly is going on and what
>>>>> we're trying to fix? With jsonrpc logs, if possible.
>>>>
>>>
>>> For the C client, in the ovn-controller case, let's focus on 2 tables
>>> Chassis_Private and Chassis_Template_Var. The difference between these
>>> 2 is that Chassis_Template_Var was recently added to the schema so
>>> ovn-controller only sets its monitor condition if
>>> sbrec_server_has_chassis_template_var_table() is true. So that's only
>>> after the remote schema was received.
>>>
>>> Sorry for the wide table below but this is the best way I could find
>>> that should explain what's happening:
>>>
>>> operation/event reconnect-state
>>> ovsdb-cs-state Chassis_Private-new_cond
>>> Chassis_Template_Var-new_cond cond-seqno expected-cond-seqno
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> 1. set_cond(Chassis_Private, [True])
>>> BACKOFF
>>> SERVER_SCHEMA_REQUESTED [True]
>>> non-existent 0 1
>>
>>
>> I think, this very first operation/event is wrong.
>> The condition change is never going to be sent, but the expected
>> sequence number is reported as 1. The only reason this doesn't
>> cause any problems right away is that we're going to reconnect
>> here and drop the state, so the application will not rely on that
>> number in the end. But it is still an incorrect return value.
>>
>> The change that I suggested should fix that fundamental issue.
>>
>
> It should, yes.
>
>> <snip>
>>
>>>>
>>>> Should we do this instead:
>>>>
>>>> diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c
>>>> index a6fbd290c..0fca03d72 100644
>>>> --- a/lib/ovsdb-cs.c
>>>> +++ b/lib/ovsdb-cs.c
>>>> @@ -892,7 +892,7 @@ ovsdb_cs_db_get_table(struct ovsdb_cs_db *db, const
>>>> char *table)
>>>>
>>>> t = xzalloc(sizeof *t);
>>>> t->name = xstrdup(table);
>>>> - t->new_cond = json_array_create_1(json_boolean_create(true));
>>>> + t->ack_cond = json_array_create_1(json_boolean_create(true));
>>>> hmap_insert(&db->tables, &t->hmap_node, hash);
>>>> return t;
>>>> }
>>>> ---
>>>>
>>>> ?
>>>>
>>>> This breaks a few tests, but it seems to be a correct change because it
>>>> is indeed a default condition that server already has for the table.
>>>>
>>>> What do you think? Will that fix the OVN oproblem?
>>>>
>>>
>>> That would fix the OVN problem I think. I'm not sure how to rewrite the
>>> "[track, simple idl, initially populated, strong references, conditional]"
>>> test however.
>>
>> Something like this should fix it:
>>
>> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
>> index 57ed0ac9d..1846cb521 100644
>> --- a/tests/ovsdb-idl.at
>> +++ b/tests/ovsdb-idl.at
>> @@ -1585,10 +1585,12 @@ OVSDB_CHECK_IDL([simple idl, initially populated,
>> strong references, conditional
>> {"op": "insert",
>> "table": "simple",
>> "row": {"s": "row0_s"}}]']],
>> - [[000: change conditions
>> + [[000: simple3: conditions unchanged
>> +000: simple4: conditions unchanged
>> +000: simple: conditions unchanged
>> 001: table simple3: name=row0_s3 uset=[] uref=[<0>] uuid=<1>
>> 001: table simple4: name=row0_s4 uuid=<0>
>> -002: change conditions
>> +002: simple4: change conditions
>> 003: table simple3: name=row0_s3 uset=[] uref=[] uuid=<1>
>> 004: {"error":null,"result":[{"uuid":["uuid","<2>"]}]}
>> 005: table simple3: name=row0_s3 uset=[] uref=[] uuid=<1>
>> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
>> index 23cfd79a4..1bc5ac17a 100644
>> --- a/tests/test-ovsdb.c
>> +++ b/tests/test-ovsdb.c
>> @@ -2628,7 +2628,7 @@ parse_link2_json_clause(struct ovsdb_idl_condition
>> *cond,
>> }
>>
>> static unsigned int
>> -update_conditions(struct ovsdb_idl *idl, char *commands)
>> +update_conditions(struct ovsdb_idl *idl, char *commands, int step)
>> {
>> const struct ovsdb_idl_table_class *tc;
>> unsigned int next_cond_seqno = 0;
>> @@ -2683,7 +2683,10 @@ update_conditions(struct ovsdb_idl *idl, char
>> *commands)
>> unsigned int seqno = ovsdb_idl_get_condition_seqno(idl);
>> unsigned int next_seqno = ovsdb_idl_set_condition(idl, tc, &cond);
>> if (seqno == next_seqno ) {
>> - ovs_fatal(0, "condition unchanged");
>> + print_and_log("%03d: %s: conditions unchanged",
>> + step, table_name);
>> + } else {
>> + print_and_log("%03d: %s: change conditions", step, table_name);
>> }
>> unsigned int new_next_seqno = ovsdb_idl_set_condition(idl, tc,
>> &cond);
>> if (next_seqno != new_next_seqno) {
>> @@ -2740,8 +2743,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>> const char cond_s[] = "condition ";
>> if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
>> next_cond_seqno =
>> - update_conditions(idl, ctx->argv[2] + strlen(cond_s));
>> - print_and_log("%03d: change conditions", step++);
>> + update_conditions(idl, ctx->argv[2] + strlen(cond_s), step++);
>> i = 3;
>> } else {
>> i = 2;
>> @@ -2809,8 +2811,8 @@ do_idl(struct ovs_cmdl_context *ctx)
>> arg + strlen(remote_s),
>> ovsdb_idl_is_connected(idl) ? "true" : "false");
>> } else if (!strncmp(arg, cond_s, strlen(cond_s))) {
>> - next_cond_seqno = update_conditions(idl, arg + strlen(cond_s));
>> - print_and_log("%03d: change conditions", step++);
>> + next_cond_seqno = update_conditions(idl, arg + strlen(cond_s),
>> + step++);
>> } else if (arg[0] != '[') {
>> if (!idl_set(idl, arg, step++)) {
>> /* If idl_set() returns false, then no transaction
>> ---
>>
>> I.e. just don't crash if canditions didn't change, because that is expected.
>>
>>
>
> Ok.
>
>>>
>>> On the other had, I had something else in mind: what if we just do the
>>> same as the python IDL? That is, create all ovsdb-cs tables immediately
>>> after the cs was created. Like that we we don't change any other IDL
>>> behavior. I'm also a bit worried about fundamental changes given
>>> that we need this change in OVN before the release that should happen
>>> this week.
>>>
>>> My suggested change (on top of master) is a bit larger.. Let me know
>>> what you think.
>>
>> I'm not sure about that. This doesn't change the fundamental issue, IMO,
>> but masks it out.
>>
>
> I'm ok either way in the end. Do you plan to post a formal patch with
> your version or should I?
Please, go ahead. Thanks!
>
> Thanks,
> Dumitru
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev