Paolo Valerio <pvale...@redhat.com> writes:

> From: Gaetan Rivet <gr...@u256.net>
>
> This patch aims to replace the expiration lists as, due to the way
> they are used, besides being a source of contention, they have a known
> issue when used with non-default policies for different zones that
> could lead to retaining expired connections potentially for a long
> time.
>
> This patch replaces them with an array of rculist used to distribute
> all the newly created connections in order to, during the sweeping
> phase, scan them without locking, and evict the expired connections
> only locking during the actual removal.  This allows to reduce the
> contention introduced by the pushback performed at every packet
> update, also solving the issue related to zones and timeout policies.
>
> Signed-off-by: Gaetan Rivet <gr...@u256.net>
> Co-authored-by: Paolo Valerio <pvale...@redhat.com>
> Signed-off-by: Paolo Valerio <pvale...@redhat.com>
> ---
> v6:
> - minor function renaming
> - removed conn->lock in conn_clean() as this was unneeded.
> - minor commit message rephrase
> ---

I have a small incremental diff that changes minor things:

- changes/remove some comments
- next_wake changed from 30 to 20
- removes MAX() as next_wake is always used

I'm planning to send it by the end of Monday so that if any feedback
comes can be folded into a single re-spin.

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 819c356c1..66a44da2e 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1552,11 +1552,7 @@ set_label(struct dp_packet *pkt, struct conn *conn,
 }
 
 
-/* Delete the expired connections from 'ctb', up to 'limit'. Returns the
- * earliest expiration time among the remaining connections in 'ctb'.  Returns
- * LLONG_MAX if 'ctb' is empty.  The return value might be smaller than 'now',
- * if 'limit' is reached */
-static long long
+static size_t
 ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
@@ -1574,16 +1570,15 @@ ct_sweep(struct conntrack *ct, struct rculist *list, 
long long now)
     return count;
 }
 
-/* Cleans up old connection entries from 'ct'.  Returns the time when the
- * next expiration might happen.  The return value might be smaller than
- * 'now', meaning that an internal limit has been reached, and some expired
- * connections have not been deleted. */
+/* Cleans up old connection entries from 'ct'.  Returns the time
+ * when the next wake will happen. The return value might be zero,
+ * meaning that an internal limit has been reached. */
 static long long
 conntrack_clean(struct conntrack *ct, long long now)
 {
-    long long next_wakeup = now + 30 * 1000;
-    unsigned int n_conn_limit, i, count = 0;
-    size_t clean_end;
+    long long next_wakeup = now + 20 * 1000;
+    unsigned int n_conn_limit, i;
+    size_t clean_end, count = 0;
 
     atomic_read_relaxed(&ct->n_conn_limit, &n_conn_limit);
     clean_end = n_conn_limit / 64;
@@ -1599,7 +1594,7 @@ conntrack_clean(struct conntrack *ct, long long now)
 
     ct->next_sweep = (i < EXP_LISTS) ? i : 0;
 
-    VLOG_DBG("conntrack cleanup %"PRIu32" entries in %lld msec", count,
+    VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count,
              time_msec() - now);
 
     return next_wakeup;
@@ -1608,24 +1603,8 @@ conntrack_clean(struct conntrack *ct, long long now)
 /* Cleanup:
  *
  * We must call conntrack_clean() periodically.  conntrack_clean() return
- * value gives an hint on when the next cleanup must be done (either because
- * there is an actual connection that expires, or because a new connection
- * might be created with the minimum timeout).
- *
- * The logic below has two goals:
- *
- * - We want to reduce the number of wakeups and batch connection cleanup
- *   when the load is not very high.  CT_CLEAN_INTERVAL ensures that if we
- *   are coping with the current cleanup tasks, then we wait at least
- *   5 seconds to do further cleanup.
- *
- * - We don't want to keep the map locked too long, as we might prevent
- *   traffic from flowing.  CT_CLEAN_MIN_INTERVAL ensures that if cleanup is
- *   behind, there is at least some 200ms blocks of time when the map will be
- *   left alone, so the datapath can operate unhindered.
- */
-#define CT_CLEAN_INTERVAL 5000 /* 5 seconds */
-#define CT_CLEAN_MIN_INTERVAL 200  /* 0.2 seconds */
+ * value gives an hint on when the next cleanup must be done. */
+#define CT_CLEAN_MIN_INTERVAL_MS 200
 
 static void *
 clean_thread_main(void *f_)
@@ -1639,9 +1618,9 @@ clean_thread_main(void *f_)
         next_wake = conntrack_clean(ct, now);
 
         if (next_wake < now) {
-            poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL);
+            poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL_MS);
         } else {
-            poll_timer_wait_until(MAX(next_wake, now + CT_CLEAN_INTERVAL));
+            poll_timer_wait_until(next_wake);
         }
         latch_wait(&ct->clean_thread_exit);
         poll_block();

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to