>-----Original Message----- >From: [email protected] [mailto:[email protected]] >On Behalf Of Ilya Maximets >Sent: Wednesday, August 22, 2018 8:33 AM >To: Stokes, Ian <[email protected]>; Vishal Deep Ajmera ><[email protected]>; [email protected] >Subject: Re: [ovs-dev] [PATCH v6] dpif-netdev: Avoid reordering of packets in >a batch with same megaflow > >master. > >Hi, Ian, everyone. >It's ok to merge. I still don't like the change, but I'll get along with it. >P.S. Current version can not be applied cleanly, minor rebase required. > >Below text is not directly related to this patch. >--- >Meanwhile, in general, I think that the main processing part >of dpif-netdev (dp_netdev_input__) is overcomplicated. It looks >like we have 4 different levels of packet batching (some of them >exists at the same time): >1. rx batches >2. Per-flow batches >3. new flow maps >4. output batches. >And we're only constantly batching packets in a different data >structures. That's frustrating. > >I'd like to refactor it in a way of more iterative and sequential >processing like using "keys_map" approach from "dpcls_lookup" and >call different processing functions until "keys_map" is not empty. >Each processing function may use "ULLONG_FOR_EACH_1" for this >"key_map", set up rules for matched packets end disable them >from "keys_map" using "ULLONG_SET0". >Right now we have 5 options to classify packets: >1. has_flow_mark() >2. emc_lookup() >3. smc_lookup_batch() >4. dpcls_lookup() >5. handle_packet_upcall() >some of them works with packet batches, some with individual packets. >IMHO, if we'll make a plain pipeline from them calling the next >stage until there are some unhandled packets in "keys_map", it could >be much simpler in compare to current call tree with a mess of >processing functions, each of which implemented in a different way. >In the end we'll be able to push all the packets to per-flow batches >and execute actions. > >In fact, this will effectively resolve current reordering issue too. > >I'm not sure if the above text is parsable. =) Sorry, if not. > >Any thoughts? >Anyway, I'm going to try.
[Wang, Yipeng] Hi, Ilya, Your suggestion of refactoring current code would be nice. Here is my two cents: 1. Previously when I implemented SMC cache, I tried to create a key array, and pass it to EMC_batch processing and SMC batch processing as two stages(mostly for performance reasons, batching EMC enables me to do software pipelining). The issue (at least what I observed) is miniflow_extracting to a full 32-element key array before EMC processing occupies so many cache lines thus it is costly. Current implementation reuses the same key cache line to do miniflow_extract if previous key hits EMC. This helps cache locality on the mostly-EMC-hit case. So if we assume EMC-hit is common, you may consider EMC lookup and miniflow extract separately from others for performance reasons. 2. Using masks to pass batches around is a good improvement on code readability. However, I don't know a convenient way to implement software pipelining with masks. So sometimes a compact pointer array may still benefit if software pipelining is needed. Please include me once you have a draft. Thanks Yipeng _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
