Re: [ovs-dev] [PATCH v8 5/6] dpif-netdev: Time based output batching.

Kevin Traynor Thu, 21 Dec 2017 05:44:10 -0800

On 12/21/2017 12:27 PM, Jan Scheurich wrote:
>>>> I had a cheaper solution in mind that would accept the inaccuracy of the 
>>>> accounting of tx cycles during rx batch processing (it
>> statistically slightly biases the load distribution towards the more heavily 
>> loaded rx queues). And instead of attributing the tx cycles outside
>> the rx batching based on a stored rx queue per packet, I would simply 
>> estimate the actual processing cycles per rx queue per interval by
>> distributing the total PMD processing cycles in the interval over the polled 
>> rx queues weighted by their counted processing cycles.
>>>
>>> I'm not sure if I fully understand your idea, but batches are likely will 
>>> be sent
>>> on one of the next polling cycles, not on the cycle where they was received.
>>> Moreover, send cycles are very different for different port types. And also 
>>> packets
>>> form a single rx queue may have different destinations (different types of 
>>> output ports).
>>> I'm not sure if it possible to correctly count cycles in this case.
>>>
>>
>> I agree, tx processing can be very different depending on destination,
>> especially with larger packet sizes where some paths may have a copy.
> 
> I understand. There can be a big difference between physical and vhostuser 
> ports, especially if the copy passes over the QPI. And if the per packet tx 
> accounting is not causing too much overhead, it would be the best.
> 
> My suggestion for simplification was not aiming at the same accuracy as 
> Ilya's patch. Perhaps rebalancing doesn't really require that accuracy. Often 
> the new rx queue distribution will anyway behave different compared to the 
> prediction because of changed batching characteristics.
>


Sure, that is why I had suggested to perhaps make no assumption about
future batch opportunities and just add the full cost of tx to each rxq
that has at least one packet in the flush. OTOH, then the stats for the
current situation could look worse than things actually are. Also, it
might not really simplify the code as Ilya'a patch just gives each rxq
some cycles per packet in a simple loop, whereas changing to give a full
cycle count per rxq would mean checking for duplicate rxqs etc. I guess
it's debatable.

Anyway, I think both those schemes are a fix for attributing cycles to
the wrong rxq, or not counting them at all.

>>
>>> Am I missed something?
>>>
>>>> After all, Kevin algorithm only needs comparable relative rx queue loads 
>>>> across the PMDs to function. In this context I would also
>> suggest to exclude idle polling cycles from the algorithm because they are 
>> not load related and always eat up the total cycle budget of the
>> PMD anyhow.
>>
>> Idle polling per rxq is not counted in the balance algorithm (for the
>> reason you say) but it's needed to calculate a total cycles over the
>> correct time period for providing the user with an rxq percentage of pmd
>> used stats. Counting (2/3) and calculations (3/3) are in the balance
>> stats patches for ref.
> 
> Sorry, I didn't check the patches carefully. I just assumed because you do 
> count idle polling cycles per rx queue. But the total number of cycles per 
> interval could easily obtained from the PMD cycles. Perhaps a chance to 
> simplify?
> 

Good question, it had been my first thought too. The issue was the full
PMD cycles are not in sync with the rxq measurement intervals and are
also controlled by the user with pmd-stats-clear. It was simpler to just
extend the rxq counting code a bit to capture idle cycles, than to try
and synchronize with the full PMD cycles and I didn't want to change
their behaviour. Actually, with the rework to the data structure, I
think measuring idle/proc cycles for an rxq simplified the existing code
a little too.

thanks,
Kevin.

> BR, Jan
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 5/6] dpif-netdev: Time based output batching.

Reply via email to