On 17/08/2021 13:52, Ilya Maximets wrote:
On 8/17/21 1:27 PM, Anton Ivanov wrote:
Hi Ilia, hi list,

I ran some detailed experiments and there is an issue with all forms of 
"skipping" and/or reordering processing.

If the session list is skipped or reordered (I tried "fast-forwarding" the list 
to a new head position after hitting a time constraint), ovsdb fails to issue the 
response to some transactions when running the cluster test suite.

At present I am unable to get to the root cause.

The issue does not exist if processing bails out of the session loop and is 
re-run IN FULL (as in the earliest versions of the patch).
That is weird.  I'm not sure how the re-ordering is different from
the 're-run in full' here.  The only thing that different is an
actual order in which sessions are processed, because we're still
re-running all of them in full until the time allows.

It may be the way I am reordering - directly manipulating the list head and 
pointers instead of popping and pushing.

If it works via the brute-force method (pop/push), I will leave that as first 
iteration and try to figure out what extra macros do you need in list.h at a 
later date.

Brgds,


I am going to re-issue the patch without any skipping whatsoever (either at 
remotes or at sessions level), because that works and improves raft (and 
overall ovn) stability.

While there may be some starvation of the sessions towards the end of the 
session list, it should be a second order effect, because re-processing 
sessions which have just been processed generates only a minimal amount of 
changes.

Skipping (if any) will be a later optimization after I get to the bottom of 
this and figure out why monitor updates are not followed by the transaction 
response.
This doesn't sound good to me.  It's pretty easy to spam the
ovsdb-server with monitor requests or condition changes.  This
requires walk across the whole database.  And if the database
is big enough, other sessions will never be served due to one
faulty/malicious connection.   It's also possible that we
have a few thousands connections and processing of all of them
legitimately takes a lot of time.  This will be a problem
if the rate of database changes is relatively high and constant.

Best regards, Ilya Maximets.

--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to