Re: [ovs-dev] [PATCH v8 5/6] dpif-netdev: Time based output batching.

Ilya Maximets Thu, 21 Dec 2017 07:16:22 -0800

Thanks everyone for working on this.

For now, I've incorporated changes from incremental patch and re-splitted it for
simpler review. Rebased on current istokes/dpdk_merge and sent as v9.
Documentation patch sent together with the series + change to NEWS.


Best regards, Ilya Maximets.

On 21.12.2017 16:42, Kevin Traynor wrote:
> On 12/21/2017 12:27 PM, Jan Scheurich wrote:
>>>>> I had a cheaper solution in mind that would accept the inaccuracy of the 
>>>>> accounting of tx cycles during rx batch processing (it
>>> statistically slightly biases the load distribution towards the more 
>>> heavily loaded rx queues). And instead of attributing the tx cycles outside
>>> the rx batching based on a stored rx queue per packet, I would simply 
>>> estimate the actual processing cycles per rx queue per interval by
>>> distributing the total PMD processing cycles in the interval over the 
>>> polled rx queues weighted by their counted processing cycles.
>>>>
>>>> I'm not sure if I fully understand your idea, but batches are likely will 
>>>> be sent
>>>> on one of the next polling cycles, not on the cycle where they was 
>>>> received.
>>>> Moreover, send cycles are very different for different port types. And 
>>>> also packets
>>>> form a single rx queue may have different destinations (different types of 
>>>> output ports).
>>>> I'm not sure if it possible to correctly count cycles in this case.
>>>>
>>>
>>> I agree, tx processing can be very different depending on destination,
>>> especially with larger packet sizes where some paths may have a copy.
>>
>> I understand. There can be a big difference between physical and vhostuser 
>> ports, especially if the copy passes over the QPI. And if the per packet tx 
>> accounting is not causing too much overhead, it would be the best.
>>
>> My suggestion for simplification was not aiming at the same accuracy as 
>> Ilya's patch. Perhaps rebalancing doesn't really require that accuracy. 
>> Often the new rx queue distribution will anyway behave different compared to 
>> the prediction because of changed batching characteristics.
>>
> 
> Sure, that is why I had suggested to perhaps make no assumption about
> future batch opportunities and just add the full cost of tx to each rxq
> that has at least one packet in the flush. OTOH, then the stats for the
> current situation could look worse than things actually are. Also, it
> might not really simplify the code as Ilya'a patch just gives each rxq
> some cycles per packet in a simple loop, whereas changing to give a full
> cycle count per rxq would mean checking for duplicate rxqs etc. I guess
> it's debatable.
> 
> Anyway, I think both those schemes are a fix for attributing cycles to
> the wrong rxq, or not counting them at all.
> 
>>>
>>>> Am I missed something?
>>>>
>>>>> After all, Kevin algorithm only needs comparable relative rx queue loads 
>>>>> across the PMDs to function. In this context I would also
>>> suggest to exclude idle polling cycles from the algorithm because they are 
>>> not load related and always eat up the total cycle budget of the
>>> PMD anyhow.
>>>
>>> Idle polling per rxq is not counted in the balance algorithm (for the
>>> reason you say) but it's needed to calculate a total cycles over the
>>> correct time period for providing the user with an rxq percentage of pmd
>>> used stats. Counting (2/3) and calculations (3/3) are in the balance
>>> stats patches for ref.
>>
>> Sorry, I didn't check the patches carefully. I just assumed because you do 
>> count idle polling cycles per rx queue. But the total number of cycles per 
>> interval could easily obtained from the PMD cycles. Perhaps a chance to 
>> simplify?
>>
> 
> Good question, it had been my first thought too. The issue was the full
> PMD cycles are not in sync with the rxq measurement intervals and are
> also controlled by the user with pmd-stats-clear. It was simpler to just
> extend the rxq counting code a bit to capture idle cycles, than to try
> and synchronize with the full PMD cycles and I didn't want to change
> their behaviour. Actually, with the rework to the data structure, I
> think measuring idle/proc cycles for an rxq simplified the existing code
> a little too.
> 
> thanks,
> Kevin.
> 
>> BR, Jan
>>
> 
> 
> 
> 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 5/6] dpif-netdev: Time based output batching.

Reply via email to