Re: [ovs-dev] [RFC 1/4] dpif-netdev: Refactor datapath flow cache

Wang, Yipeng1 Mon, 26 Feb 2018 10:47:34 -0800

Bhanu, thanks for the comment. Please see my comments inlined.

And Jan please feel free to add more.


>-----Original Message-----
>From: Bodireddy, Bhanuprakash
>Sent: Tuesday, February 20, 2018 1:10 PM
>To: Wang, Yipeng1 <[email protected]>; [email protected]; 
>[email protected]
>Cc: Tai, Charlie <[email protected]>
>Subject: RE: [ovs-dev] [RFC 1/4] dpif-netdev: Refactor datapath flow cache
>
>Hi Yipeng,
>
>Thanks for the RFC series. This patch series need to be rebased.
>I applied this on an older commit to do initial testing. Some comments below.
>
>I see that DFC cache is implemented in similar lines of EMC cache except that 
>it holds
>Million entries and uses more bits of RSS hash to index in to the Cache. 
[Wang, Yipeng] 
Conceptually, DFC/CD is much memory efficient than EMC since it does not store 
the full key.

>DPCLS lookup is expensive and consumes 30% of total cycles in some test cases 
>and DFC
>Cache will definitely reduce some pain there.
>
>On the memory foot print:
>
>On Master,
>    EMC  entry size = 592 bytes
>               8k entries = ~4MB.
>
>With this patch,
>     EMC entry size = 256 bytes
>              16k entries = ~4MB.
>
>I like the above reduction in flow key size, keeping the entry size to 
>multiple of cache line and still keeping the overall EMC size to
>~4MB with more EMC entries.
>
>However my concern is DFC cache size. As DFC cache is million entries and 
>consumes ~12 MB for each PMD thread, it might not fit in to
>L3 cache. Also note that in newer platforms L3 cache is shrinking and L2 is 
>slightly increased (eg: Skylake has only 1MB L2 and 19MB L3
>cache).
>
[Wang, Yipeng] 
Yes, this is also our concern. The following (4/4) patch introduces indirect 
table for this reason. 

>Inspite of the memory footprint I still think DFC cache improves switching 
>performance as it is lot less expensive than invoking
>dpcls_lookup() as the later involves more expensive hash computation and 
>subtable traversal. It would be nice if there is more testing
>done with real VNFs to see that this patch doesn't cause cache thrashing and 
>suffer from memory bottlenecks.
>
[Wang, Yipeng] 
I don't have real VNF benchmarking set, but will it be useful if we do test 
with some synthetic cache pirating application to emulate the effect? 
 
>
>[BHANU]
>I am not sure if this is good idea to simplify EMC by using 1-way associative 
>instead of current 2 way associative implementation.
>I prefer to leave the current approach as-is unless we have strong data to 
>prove it otherwise.
>This comment applies to below code changes w.r.t to EMC lookup and insert.
>
[Wang, Yipeng] 
I have synthetic tests showing simpler EMC works much better, while I also have 
other sets of synthetic tests showing the 2way EMC works much better.
It eventually depends on the use cases.  We had some discussion at this thread: 
https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/342197.html

>>The maximum size of the EMC flow key is limited to 256 bytes to reduce the
>>memory footprint. This should be sufficient to hold most real life packet flow
>>keys. Larger flows are not installed in the EMC.
>
>+1
>
>>
>> reload:
>>     pmd_alloc_static_tx_qid(pmd);
>>
>>@@ -4166,8 +4282,7 @@ reload:
>>         }
>>
>>         if (lc++ > 1024) {
>>-            bool reload;
>>-
>>+            dfc_slow_sweep(&pmd->flow_cache);
>[BHANU]  I need to better understand the usage of RCUs. But I am wondering why 
>the sweep function
>Isn't under the below !rcu_try_quiesce() condition?

[Wang, Yipeng] 
The condition is to see if this pmd can be successfully quiesced once in this 
iteration. And sweeping will evict EMC entries and register rcu callbacks to 
free megaflow entry later.  Then putting this function under the condition 
means that we sweep and register new call back only if this PMD successfully 
quiesced this round.
I think functionally it should be fine either put sweep inside or outside the 
condition. But if outside the condition, there may be callbacks accumulated and 
cannot be called since this PMD may never be able to enter quiesced state once. 
 So, I think we may want to change the code to keep the sweep under the 
condition.
Others may know better of this, please comment.
>
>>@@ -5039,7 +5154,7 @@ emc_processing(struct dp_netdev_pmd_thread
>>*pmd,
>>         }
>>         miniflow_extract(packet, &key->mf);
>>         key->len = 0; /* Not computed yet. */
>>-        /* If EMC is disabled skip hash computation and emc_lookup */
>>+        /* If DFC is disabled skip hash computation and DFC lookup */
>[BHANU] Why would DFC ever be disabled by user? This was done earlier for EMC 
>as in EMC saturation case, there was penalty doing
>emc insertion
>and emc_lookup() (that involved costly memcmp()) and disabling EMC showed some 
>10% advantage for some use cases.
>
>This might not apply for DFC and I assume DFC shouldn't be configurable option?

[Wang, Yipeng] 
Yes, this part of code should be tweaked a bit if we want DFC to be always on.



Thanks
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 1/4] dpif-netdev: Refactor datapath flow cache

Reply via email to