On 8/18/20 4:00 PM, Ilya Maximets wrote:
> On 8/18/20 12:42 PM, K Venkata Kiran wrote:
>> Hi,
>>
>> We did further tests and found that it is indeed the conntrack global lock 
>> that was introduced with below commit that is causing the performance 
>> degradation.
>>
>> We did Perf tool analysis with and without below commit and we could see 
>> huge increase in pthread_mutex_lock samples.  In our testbed we had 4 PMD 
>> threads handling traffic from two dpdk and various VHU ports.
>>
>> At a data structure level , we could see a major change w.r.t to how the 
>> connections were being stored in conntrack structure.
>>
>> *Before :*
>>
>> conntrack_bucket {
>>               struct ct_lock lock;
>>               struct hmap connections OVS_GUARDED;
>>               struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
>>               struct ovs_mutex cleanup_mutex;
>>               long long next_cleanup OVS_GUARDED;
>> }
>>
>> *After :*
>>
>> struct conntrack {
>> -    /* Independent buckets containing the connections */
>> -    struct conntrack_bucket buckets[CONNTRACK_BUCKETS];
>> ..
>> +    struct ovs_mutex ct_lock; /* Protects 2 following fields. */
>> +    struct cmap conns OVS_GUARDED;
>> +    struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
>> }
>>
>> Earlier ‘conntrack_bucket’ structure  was holding list of connections for 
>> given hash bucket . This was removed and all connections added to main 
>> ‘conntrack’ structure and that list traversal now is protected by conntrack 
>> global ‘ct_lock’.
>>
>> We see the global 'ct->ct_lock' taken to do 'conn_update_expiration' (which 
>> happens for every packet) is adding too much of the performance drop
>>
>> Earlier with the conn_key_hash the connections created are mapped to 
>> matching hash bucket. Any update of state (mostly expiration time) involves 
>> moving the connection back into the list of connections belonging to that 
>> hash bucket. This was done with bucket level lock and with 256 buckets we 
>> have less contention.
>>
>> Now this ‘ct->ct_lock’ adds more contention and is causing the performance 
>> degradation.
>>
>> We also did the test-conntrack benchmarking
>>
>> *1. The standard 1 thread test :*
>>
>> After commit
>> $ ./ovstest test-conntrack benchmark 1 14880000 32
>> conntrack:   2230 ms
>>
>> Before commit
>> $ ./ovstest test-conntrack benchmark 1 14880000 32
>> conntrack:   1673 ms
>>
>> *2. We also did multiple thread test (4 threads) *
>>
>> $ ./ovstest test-conntrack benchmark 4 33554432 32 1    (32 Million packets)
>> Before : conntrack:  15043 ms / conntrack:  14644 ms
>> After  : conntrack:  71373 ms / conntrack:  65816 ms
>>
>> So with increase in number of connections and multiple threads doing 
>> conntrack_execute the impact is more and profound.
> 
> Thanks for testing and investigation.  I fully agree that userspace conntrack
> is not in a good shape, especially in terms of multi-threading and locking
> scheme.  And, unfortunately, it's not actively developed right now.
> 
>> Are there any changes that are expected to fix this performance issue in the 
>> near future?
> 
> I'm not aware of any ongoing development in this area.
> 
>> Do we have  conntrack related  performance tests that are run with every 
>> release ?
> 
> I'm not aware of any specific conntrack-related performance tests.
> We are lucking performance tests in many areas, actually.  We do not

s/lucking/lacking/

> have any public infrastructure to run these tests by ourselves.
> 
> Volunteers are always welcome.
> 
> Best regards, Ilya Maximets.
> 
>>
>> Thanks
>> Kiran
>>
>> *From:* K Venkata Kiran
>> *Sent:* Thursday, August 6, 2020 4:20 PM
>> *To:* [email protected]; [email protected]; Darrell Ball 
>> <[email protected]>; [email protected]
>> *Cc:* Anju Thomas <[email protected]>; K Venkata Kiran 
>> <[email protected]>
>> *Subject:* Performance drop with conntrack flows
>>
>> Hi,
>>
>> We see 40% traffic drop with UDP traffic over VxLAN and 20% traffic drop 
>> with UDP traffic over MPLSoGRE between OVS 2.8.2 & OVS 2.12.1.
>>
>> We narrowed the drop in performance in our test is due to below commit and 
>> backing out the commit fixed the performance drop problem.
>>
>> The commit of concern is :
>> https://github.com/openvswitch/ovs/commit/967bb5c5cd9070112138d74a2f4394c50ae48420
>> commit 967bb5c5cd9070112138d74a2f4394c50ae48420
>> Author: Darrell Ball <[email protected] <mailto:[email protected]>>
>> Date:   Thu May 9 08:15:07 2019 -0700
>>  conntrack: Add rcu support.
>>
>> We suspect ‘ct->ct_lock’ lock taken to do ‘conn_update_state’ and for 
>> conn_key_lookup could be causing the issue.
>>
>> Anyone noticed the issue and any pointers on fix? We could not get any 
>> obvious commit that could solve the issue. Any guidance in solving this 
>> issue helps?
>>
>> Thanks
>>
>> Kiran

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to