>-----Original Message-----
>From: [email protected] [mailto:[email protected]] 
>On Behalf Of Ilya Maximets
>Sent: Wednesday, August 22, 2018 8:33 AM
>To: Stokes, Ian <[email protected]>; Vishal Deep Ajmera 
><[email protected]>; [email protected]
>Subject: Re: [ovs-dev] [PATCH v6] dpif-netdev: Avoid reordering of packets in 
>a batch with same megaflow
>
>master.
>
>Hi, Ian, everyone.
>It's ok to merge. I still don't like the change, but I'll get along with it.
>P.S. Current version can not be applied cleanly, minor rebase required.
>
>Below text is not directly related to this patch.
>---
>Meanwhile, in general, I think that the main processing part
>of dpif-netdev (dp_netdev_input__) is overcomplicated. It looks
>like we have 4 different levels of packet batching (some of them
>exists at the same time):
>1. rx batches
>2. Per-flow batches
>3. new flow maps
>4. output batches.
>And we're only constantly batching packets in a different data
>structures. That's frustrating.
>
>I'd like to refactor it in a way of more iterative and sequential
>processing like using "keys_map" approach from "dpcls_lookup" and
>call different processing functions until "keys_map" is not empty.
>Each processing function may use "ULLONG_FOR_EACH_1" for this
>"key_map", set up rules for matched packets end disable them
>from "keys_map" using "ULLONG_SET0".
>Right now we have 5 options to classify packets:
>1. has_flow_mark()
>2. emc_lookup()
>3. smc_lookup_batch()
>4. dpcls_lookup()
>5. handle_packet_upcall()
>some of them works with packet batches, some with individual packets.
>IMHO, if we'll make a plain pipeline from them calling the next
>stage until there are some unhandled packets in "keys_map", it could
>be much simpler in compare to current call tree with a mess of
>processing functions, each of which implemented in a different way.
>In the end we'll be able to push all the packets to per-flow batches
>and execute actions.
>
>In fact, this will effectively resolve current reordering issue too.
>
>I'm not sure if the above text is parsable. =) Sorry, if not.
>
>Any thoughts?
>Anyway, I'm going to try.

[Wang, Yipeng] 
Hi, Ilya,

Your suggestion of refactoring current code would be nice. Here is my two cents:

1. Previously when I implemented SMC cache, I tried to create a key array, and 
pass it to EMC_batch processing and SMC batch processing as two stages(mostly 
for performance reasons, batching EMC enables me to do software pipelining). 
The issue (at least what I observed) is miniflow_extracting to a full 
32-element key array before EMC processing occupies so many cache lines thus it 
is costly. Current implementation reuses the same key cache line to do 
miniflow_extract if previous key hits EMC. This helps cache locality on the 
mostly-EMC-hit case.  So if we assume EMC-hit is common, you may consider EMC 
lookup and miniflow extract separately from others for performance reasons.

2. Using masks to pass batches around is a good improvement on code 
readability. However, I don't know a convenient way to implement software 
pipelining with masks. So sometimes a compact pointer array may still benefit 
if software pipelining is needed.

Please include me once you have a draft.

Thanks
Yipeng


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to