On 4/12/25 5:52 PM, Numan Siddique wrote:
> On Wed, Apr 9, 2025 at 5:08 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>
>> On 4/9/25 5:43 PM, Numan Siddique wrote:
>>> Hi OVS maintainers and others,
>>>
>>> In our deployments where we run OVS 3.2.2, we see a crash in
>>> ovs-vswitchd when the ovn-controller
>>> gets upgraded from 23.09 to 24.09.  The crash is seen exactly when
>>> ovn-controller restarts
>>> and connects to ovs-vswitchd.  We have configured
>>> ovn-ofctrl-wait-before-clear="300000".
>>> So ovn-controller doesn't delete the existing openflow rules as soon
>>> as it connects to ovs-vswitchd.
>>>
>>> In 24.09, ovn-controller now configures the flow_table prefixes for
>>> each table [1] which was not
>>> the case in 23.09.
>>>
>>> We couldn't reproduce the issue again and it was seen only once on a
>>> compute node.
>>> Other compute nodes in the deployment didn't see this crash when the
>>> upgrade happened.
>>>
>>> I suspect if we are hitting the same issue which this commit tried to fix 
>>> [2].
>>
>> Yeah, the whole thing how trie updates are managed doesn't seem right.
>> The memory fence in the classifier_lookup__() doesn't actually do much,
>> because cls->n_tries is read multiple times after the fence and
>> subtable->trie_plen is accessed after the fence.  So, I'm not sure
>> ordering of what this fence is trying to ensure.
>>
>> The following sequence of events seems very possible:
>> 0. n_tries is 2 from the beginning.
>> 1. Main thread calls classifier_set_prefix_fields() and counts 3 tries.
>> 2. Main thread clears subtable->trie_plen for 2 current tries.
>> 3. Main thread waits for RCU to synchronize and continues.
>> 4. Handler thread starts classification for some packet and calls
>>    classifier_lookup__().
>> 5. Handler thread copies pointers for 2 current tries into trie_ctx[].
>> 6. Main thread initializes 3 tries, sets subtable->trie_plen for all
>>    three of them and sets n_tries to 3.
>> 7. Handler thread calls find_match_wc(trie_ctx[], cls->n_tries == 3).
>> 8. Handler checks subtable->trie_plen[2] is set and attempts to use
>>    trie_ctx[2] that is not initialized, as only two pointers were copied.
>>
>> One obvious thing here is that n_tries must be read only once in the
>> classifier_lookup__(), so we do not access elements of the trie_ctx[]
>> that were not initialized.
>>
>> A larger change with the re-evaluation of the whole thread fence thing
>> might look something like this:
>>
>> 1. Convert cls->n_tries into atomic.
>> 2. In classifier_set_prefix_fields():
>> 2.1. Before making any changes, set cls->n_tries to zero with the
>>      memory_order_release (this one may be relaxed?).
>> 2.2. ovsrcu_synchronize() to wait for all threads to stop using tries.
>> 2.3. Update tries while nobody is using them.
>> 2.4. Set cls->n_tries to a new value with memory_order_release.
>> 3. In the classifier_lookup__(), read cls->n_tries with the
>>    memory_order_acquire and use that once read value throughout.
>>
>> Current code is trying to be smart and not disable trie lookups for
>> tries that are not changing.  But it may be better to just turn them
>> all off, as the schema above will do, and run without tries for a
>> short period of time until they are reconfigured.  With that schema
>> we can probably revert [1] and avoid clearing trie_plen fields and
>> so cleanup a bit of relevant code.  But I didn't look that deep yet.
>>
> 
> Thanks Ilya for the explanation.

Np.

> I'm not familiar with the classifier code.  I'll see if I can take a
> look at the fix.

We don't really have many people who knows the classifier code well.
So, having new eyes on it is good for a long run. :)

> If you or others want to address it, feel free to look at it.

I'll be off for a week, but if you'll find you don't have much time for
this, let me know, I can take over once I'm back.

Thanks!

Best regards, Ilya Maximets.

> 
> Thanks
> Numan
> 
>> Best regards, Ilya Maximets.
>>
>>>
>>> From the coredump analysis,
>>> `check_tries` function at `../lib/classifier.c:1585` is where the
>>> crash seems to occur.
>>> It's trying to access `ctx->trie->field` and it is NULL.
>>>
>>> Any pointers on what could be going on here ?
>>>
>>> [1] - 8d64f2b7dc ("controller: Fix IPv6 dp flow explosion by setting
>>> flow table prefixes.")
>>>       
>>> https://github.com/ovn-org/ovn/commit/8d64f2b7dcef24d175151ba5e0732281cdeb6d54
>>>
>>> [2] - a6117059904 "(classifier: Prevent tries vs n_tries race leading
>>> to NULL dereference.")
>>>       
>>> https://github.com/openvswitch/ovs/commit/a6117059904bb692039c926221964dd6d49b3bfd

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to