On 8/18/20 4:00 PM, Ilya Maximets wrote:
> On 8/18/20 12:42 PM, K Venkata Kiran wrote:
>> Hi,
>>
>> We did further tests and found that it is indeed the conntrack global lock
>> that was introduced with below commit that is causing the performance
>> degradation.
>>
>> We did Perf tool analysis with and without below commit and we could see
>> huge increase in pthread_mutex_lock samples. In our testbed we had 4 PMD
>> threads handling traffic from two dpdk and various VHU ports.
>>
>> At a data structure level , we could see a major change w.r.t to how the
>> connections were being stored in conntrack structure.
>>
>> *Before :*
>>
>> conntrack_bucket {
>> struct ct_lock lock;
>> struct hmap connections OVS_GUARDED;
>> struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
>> struct ovs_mutex cleanup_mutex;
>> long long next_cleanup OVS_GUARDED;
>> }
>>
>> *After :*
>>
>> struct conntrack {
>> - /* Independent buckets containing the connections */
>> - struct conntrack_bucket buckets[CONNTRACK_BUCKETS];
>> ..
>> + struct ovs_mutex ct_lock; /* Protects 2 following fields. */
>> + struct cmap conns OVS_GUARDED;
>> + struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
>> }
>>
>> Earlier ‘conntrack_bucket’ structure was holding list of connections for
>> given hash bucket . This was removed and all connections added to main
>> ‘conntrack’ structure and that list traversal now is protected by conntrack
>> global ‘ct_lock’.
>>
>> We see the global 'ct->ct_lock' taken to do 'conn_update_expiration' (which
>> happens for every packet) is adding too much of the performance drop
>>
>> Earlier with the conn_key_hash the connections created are mapped to
>> matching hash bucket. Any update of state (mostly expiration time) involves
>> moving the connection back into the list of connections belonging to that
>> hash bucket. This was done with bucket level lock and with 256 buckets we
>> have less contention.
>>
>> Now this ‘ct->ct_lock’ adds more contention and is causing the performance
>> degradation.
>>
>> We also did the test-conntrack benchmarking
>>
>> *1. The standard 1 thread test :*
>>
>> After commit
>> $ ./ovstest test-conntrack benchmark 1 14880000 32
>> conntrack: 2230 ms
>>
>> Before commit
>> $ ./ovstest test-conntrack benchmark 1 14880000 32
>> conntrack: 1673 ms
>>
>> *2. We also did multiple thread test (4 threads) *
>>
>> $ ./ovstest test-conntrack benchmark 4 33554432 32 1 (32 Million packets)
>> Before : conntrack: 15043 ms / conntrack: 14644 ms
>> After : conntrack: 71373 ms / conntrack: 65816 ms
>>
>> So with increase in number of connections and multiple threads doing
>> conntrack_execute the impact is more and profound.
>
> Thanks for testing and investigation. I fully agree that userspace conntrack
> is not in a good shape, especially in terms of multi-threading and locking
> scheme. And, unfortunately, it's not actively developed right now.
>
>> Are there any changes that are expected to fix this performance issue in the
>> near future?
>
> I'm not aware of any ongoing development in this area.
>
>> Do we have conntrack related performance tests that are run with every
>> release ?
>
> I'm not aware of any specific conntrack-related performance tests.
> We are lucking performance tests in many areas, actually. We do not
s/lucking/lacking/
> have any public infrastructure to run these tests by ourselves.
>
> Volunteers are always welcome.
>
> Best regards, Ilya Maximets.
>
>>
>> Thanks
>> Kiran
>>
>> *From:* K Venkata Kiran
>> *Sent:* Thursday, August 6, 2020 4:20 PM
>> *To:* [email protected]; [email protected]; Darrell Ball
>> <[email protected]>; [email protected]
>> *Cc:* Anju Thomas <[email protected]>; K Venkata Kiran
>> <[email protected]>
>> *Subject:* Performance drop with conntrack flows
>>
>> Hi,
>>
>> We see 40% traffic drop with UDP traffic over VxLAN and 20% traffic drop
>> with UDP traffic over MPLSoGRE between OVS 2.8.2 & OVS 2.12.1.
>>
>> We narrowed the drop in performance in our test is due to below commit and
>> backing out the commit fixed the performance drop problem.
>>
>> The commit of concern is :
>> https://github.com/openvswitch/ovs/commit/967bb5c5cd9070112138d74a2f4394c50ae48420
>> commit 967bb5c5cd9070112138d74a2f4394c50ae48420
>> Author: Darrell Ball <[email protected] <mailto:[email protected]>>
>> Date: Thu May 9 08:15:07 2019 -0700
>> conntrack: Add rcu support.
>>
>> We suspect ‘ct->ct_lock’ lock taken to do ‘conn_update_state’ and for
>> conn_key_lookup could be causing the issue.
>>
>> Anyone noticed the issue and any pointers on fix? We could not get any
>> obvious commit that could solve the issue. Any guidance in solving this
>> issue helps?
>>
>> Thanks
>>
>> Kiran
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev