This series aims to share the work done so far, and to start a discussion of a slightly different approach in terms of the way we handle the connections. It's not considered ready as further work and extensive tests are needed.
There are two main logical changes that the series tries to introduce. The first one is the reintroduction of buckets, the second one is the removal of the expiration lists. The first two patches are a prereq and were picked (untouched) from this series (by Gaetan): https://patchwork.ozlabs.org/project/openvswitch/list/?series=249027&state=* The third patch is a follow up of the following: https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ Some logic has changed, but the overall idea remained the same. Additional details in the patch description. The fourth patch introduces the buckets and removes the expiration lists. The buckets got reintroduced as cmaps instead of hmaps, trying to narrow down the critical sections as well. An alternative approach may involve the replacement of cmaps with linked lists (without increasing the number of locks), increasing the number of buckets (it could involve the ability to change the buckets number when the user changes the connection limit). This could improve the stale conns eviction during the processing of the packet, but this is currently out of scope for this series (can be discussed, though). The insertion, namely the creation of the connection, can probably be improved (or at least an improvement can be evaluated), as, in case of nat, it acquires the bucket locks calling nat_get_unique_tuple_lock(). This is needed because we may need to know the reverse tuple before locking in order to acquire the locks in the right way to avoid ABBA deadlocks. Further details about locking may be found in patch #4 description. There's a known issue, and it is related mostly to zone limit. Keeping the stale entries, leads to a mismatch between the current number of per zone connections and the actual number of valid connections. This means that if we have a small zone limit number, we may wait a whole "next_wakeup" time before the entries get cleaned up (assuming we have candidates for removal). This could be avoided, e.g. triggering a per zone cleanup when a threshold of connections has been reached (it may involve a different way to account the per zone connections). Other approaches are possible (e.g. cleaning the zone from the packet path during the insertion), but both require additional handling. Expiration lists have been removed for the sake of evaluation, but they could be included and duplicated among buckets. In this way we could distribute the contention (as a matter of fact reducing the number of connections per list) that affects the write access to a single data structure during every connection update. Patch #5 reintegrates the command below with multiple buckets: ovs-appctl dpctl/ct-bkts A couple of minor patches have been kept out of the series, for the time being, as they strongly depend on #4. Gaetan Rivet (2): conntrack: Use a cmap to store zone limits conntrack-tp: Use a cmap to store timeout policies Paolo Valerio (2): conntrack: Split single cmap to multiple buckets. conntrack: Make ovs-appctl dpctl/ct-bkts work with multiple buckets Peng He (1): conntrack: Replaces nat_conn introducing key directionality. lib/conntrack-private.h | 58 ++- lib/conntrack-tp.c | 102 ++--- lib/conntrack.c | 952 ++++++++++++++++++++++------------------ lib/conntrack.h | 4 +- lib/dpif-netdev.c | 5 +- tests/system-traffic.at | 5 +- 6 files changed, 608 insertions(+), 518 deletions(-) _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
