Thanks everyone for working on this. For now, I've incorporated changes from incremental patch and re-splitted it for simpler review. Rebased on current istokes/dpdk_merge and sent as v9. Documentation patch sent together with the series + change to NEWS.
Best regards, Ilya Maximets. On 21.12.2017 16:42, Kevin Traynor wrote: > On 12/21/2017 12:27 PM, Jan Scheurich wrote: >>>>> I had a cheaper solution in mind that would accept the inaccuracy of the >>>>> accounting of tx cycles during rx batch processing (it >>> statistically slightly biases the load distribution towards the more >>> heavily loaded rx queues). And instead of attributing the tx cycles outside >>> the rx batching based on a stored rx queue per packet, I would simply >>> estimate the actual processing cycles per rx queue per interval by >>> distributing the total PMD processing cycles in the interval over the >>> polled rx queues weighted by their counted processing cycles. >>>> >>>> I'm not sure if I fully understand your idea, but batches are likely will >>>> be sent >>>> on one of the next polling cycles, not on the cycle where they was >>>> received. >>>> Moreover, send cycles are very different for different port types. And >>>> also packets >>>> form a single rx queue may have different destinations (different types of >>>> output ports). >>>> I'm not sure if it possible to correctly count cycles in this case. >>>> >>> >>> I agree, tx processing can be very different depending on destination, >>> especially with larger packet sizes where some paths may have a copy. >> >> I understand. There can be a big difference between physical and vhostuser >> ports, especially if the copy passes over the QPI. And if the per packet tx >> accounting is not causing too much overhead, it would be the best. >> >> My suggestion for simplification was not aiming at the same accuracy as >> Ilya's patch. Perhaps rebalancing doesn't really require that accuracy. >> Often the new rx queue distribution will anyway behave different compared to >> the prediction because of changed batching characteristics. >> > > Sure, that is why I had suggested to perhaps make no assumption about > future batch opportunities and just add the full cost of tx to each rxq > that has at least one packet in the flush. OTOH, then the stats for the > current situation could look worse than things actually are. Also, it > might not really simplify the code as Ilya'a patch just gives each rxq > some cycles per packet in a simple loop, whereas changing to give a full > cycle count per rxq would mean checking for duplicate rxqs etc. I guess > it's debatable. > > Anyway, I think both those schemes are a fix for attributing cycles to > the wrong rxq, or not counting them at all. > >>> >>>> Am I missed something? >>>> >>>>> After all, Kevin algorithm only needs comparable relative rx queue loads >>>>> across the PMDs to function. In this context I would also >>> suggest to exclude idle polling cycles from the algorithm because they are >>> not load related and always eat up the total cycle budget of the >>> PMD anyhow. >>> >>> Idle polling per rxq is not counted in the balance algorithm (for the >>> reason you say) but it's needed to calculate a total cycles over the >>> correct time period for providing the user with an rxq percentage of pmd >>> used stats. Counting (2/3) and calculations (3/3) are in the balance >>> stats patches for ref. >> >> Sorry, I didn't check the patches carefully. I just assumed because you do >> count idle polling cycles per rx queue. But the total number of cycles per >> interval could easily obtained from the PMD cycles. Perhaps a chance to >> simplify? >> > > Good question, it had been my first thought too. The issue was the full > PMD cycles are not in sync with the rxq measurement intervals and are > also controlled by the user with pmd-stats-clear. It was simpler to just > extend the rxq counting code a bit to capture idle cycles, than to try > and synchronize with the full PMD cycles and I didn't want to change > their behaviour. Actually, with the rework to the data structure, I > think measuring idle/proc cycles for an rxq simplified the existing code > a little too. > > thanks, > Kevin. > >> BR, Jan >> > > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
