From: Wilson Peng <[email protected]> v2-v3 change: Remove the unneeded sanity check and just correct the failure when OvsNatInit is called failed.
While deploying Tanzu Kubernetes(Antrea based solution) in Broadcom customer, Sometimes it is found that the kernel thread OvsConntrackEntryCleaner is not Started after the Windows node is rebooted on unexpected condition. It could Be also observed a similar issue in local Antrea setup via Clean-AntreaNetwork.ps1 Which will Remove-VMSwitch and re-create it on Windows node. After checking the local conntrack dump, OVS doesn’t remove the connection Entries even though the time is overdue, we could find the connection entries Created several hours ago in the dump, within a state (TIME_WAIT) that was Supposed to be deleted earlier. At that time, the count of the existing entries In the OVS conntrack zone is far from the up limit, the actual number is 19k. Then we tried to flush the conntrack with CMD "ovs-dpctl.exe flush-conntrack" And all the conntrack entries could be removed. In this patch, it does make sure return wrong value when OvsNatInit is failed To call on OvsInitConntrack. Antrea team does help do the regression test with build including the patch And it could PASS the testing. And it is not find the Connectract not timeout Essue with same reproducing condition. It is good to backport the fix to main and backported until 2.17. Signed-off-by: Wilson Peng <[email protected]> --- datapath-windows/ovsext/Conntrack.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/datapath-windows/ovsext/Conntrack.c b/datapath-windows/ovsext/Conntrack.c index 39ba5cc10..fb503f786 100644 --- a/datapath-windows/ovsext/Conntrack.c +++ b/datapath-windows/ovsext/Conntrack.c @@ -40,6 +40,7 @@ static POVS_CT_ZONE_INFO zoneInfo = NULL; extern POVS_SWITCH_CONTEXT gOvsSwitchContext; static ULONG ctTotalEntries; static ULONG defaultCtLimit; +static BOOLEAN OvsNatInitDone = FALSE; static __inline OvsCtFlush(UINT16 zone, struct ovs_key_ct_tuple_ipv4 *tuple); static __inline NDIS_STATUS @@ -114,10 +115,13 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context) zoneInfo[i].limit = defaultCtLimit; } - status = OvsNatInit(); - - if (status != STATUS_SUCCESS) { - OvsCleanupConntrack(); + if (OvsNatInitDone == FALSE) { + status = OvsNatInit(); + if (status != STATUS_SUCCESS) { + OvsCleanupConntrack(); + return status; + } + OvsNatInitDone = TRUE; } return STATUS_SUCCESS; @@ -168,10 +172,14 @@ OvsCleanupConntrack(VOID) } OvsFreeMemoryWithTag(ovsCtBucketLock, OVS_CT_POOL_TAG); ovsCtBucketLock = NULL; - OvsNatCleanup(); + if (OvsNatInitDone) { + OvsNatCleanup(); + OvsNatInitDone = FALSE; + } NdisFreeSpinLock(&ovsCtZoneLock); if (zoneInfo) { OvsFreeMemoryWithTag(zoneInfo, OVS_CT_POOL_TAG); + zoneInfo = NULL; } } @@ -1520,6 +1528,8 @@ OvsConntrackEntryCleaner(PVOID data) LOCK_STATE_EX lockState; BOOLEAN success = TRUE; + OVS_LOG_INFO("Start the OVS ConntrackEntry Cleaner system thread," + " context: %p", context); while (success) { if (context->exit) { break; @@ -1541,6 +1551,7 @@ OvsConntrackEntryCleaner(PVOID data) KeWaitForSingleObject(&context->event, Executive, KernelMode, FALSE, (LARGE_INTEGER *)&threadSleepTimeout); } + OVS_LOG_INFO("Terminate the OVS ConntrackEntry Cleaner system thread"); PsTerminateSystemThread(STATUS_SUCCESS); } -- 2.39.2 (Apple Git-143) _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
