Re: [Qemu-devel] [PATCH] e1000e: Don't zero out buffer address in rx descriptor

Dmitry Fleytman Wed, 19 Oct 2016 03:18:42 -0700

> On 19 Oct 2016, at 13:07 PM, Kevin Wolf <kw...@redhat.com> wrote:
> 
> Am 19.10.2016 um 09:57 hat Dmitry Fleytman geschrieben:
>> 
>>    On 19 Oct 2016, at 10:25 AM, Kevin Wolf <kw...@redhat.com> wrote:
>> 
>>    Am 19.10.2016 um 08:48 hat Dmitry Fleytman geschrieben:
>> 
>>           Another related thing that I noticed while debugging this and
>>        turning on
>>           tracing is that the interrupt throttling timers kept firing even if
>>           there was no activity at all. Something might be wrong, there, too.
>> 
>>           Next thing I wondered why throttling was enabled at all because the
>>        spec
>>           says the default is 0 (turned off). So one thing that I'm pretty
>>        sure is
>>           just a misunderstanding is the following defintion:
>> 
>>           #define E1000E_MIN_XITR     (500) /* No more then 7813 interrupts
>>        per
>>                                               second according to spec
>>        10.2.4.2 */
>> 
>> 
>>           As I understand it, the spec is just giving an example there and
>>        lower
>>           values are valid as well. At the very least, 0 should be accepted 
>> as
>>        a
>>           special case because it means "disabled" and it's specified to be
>>        the
>>           default.
>> 
>> 
>>        Right, this according to the spec this value should be 0 by default 
>> and
>>        throttling should be disabled.
>> 
>>        Current device implementation does not allow specification of
>>        throttling interval less than 500 and treats interval 0 as throttling
>>        enabled with interval 500.
>> 
>>        This is done by intention because according to the spec (10.2.4.2)
>>        device cannot produce more than 7813 interrupts per second even when
>>        throttling is disabled. Therefore, even in case of interrupt storm
>>        (continuous interrupt re-injection by device), number of interrupts
>>        produced by device is limited and CPU (driver) has enough time to do
>>        its job and handle problematic interrupt state.
>> 
>> 
>>    I think you're misinterpreting the spec here. This is the paragraph
>>    we're talking about, right?
>> 
>>       For example, if the interval is programmed to 500 (decimal), the
>>       82574 guarantees the CPU is not interrupted by it for 128 µs from
>>       the last interrupt. The maximum observable interrupt rate from the
>>       82574 should never exceed 7813 interrupts/sec.
>> 
>>    It says "for example", so this is just demonstrating how you can
>>    calculate the effects of a specific throttling setting. It says that
>>    _if_ you set ITR to 500, you get an interrupt at most every
>>    500 * 256 ns = 128 µs. And 1 / 128 µs = 7821.5 Hz, so this is the
>>    effective maximum frequency that _this specific_ ITR setting allows.
>> 
>>    I also don't think it would make any sense for hardware to be unable to
>>    trigger interrupts more often than that. Triggering an interrupt is not
>>    a complex operation that involves a lot of calculation or anything.
>> 
>> 
>> Hi Kevin,
>> 
>> Yes, I assume that sentence
>> 
>> “The maximum observable interrupt rate from the
>> 82574 should never exceed 7813 interrupts/sec."
>> 
>> is not a related to a specific case, but describes a generic limitation,
>> however it might be I’m misreading the spec indeed.
> 
> For me everything hints at this being only an example: Not only do the
> numbers match the example made in the previous sentence (which is
> explicitly called an example) and look weird as a real limit, but it's
> also in the same paragraph as the explicit example and the spec is
> generally good at starting a new paragraph when talking about a new
> aspect.


I tend to agree.

> 
> I don't care enough to actually make you change anything, but I wanted
> you to be aware that the interpretation of the spec as coded into our
> emulation isn't clear at all (in fact, I would think it's clear that
> it's _not_ meant this way) and that real hardware probably doesn't do
> the same thing as we do.

Thanks, Kevin.

> 
> What we're doing may still have merit, as a workaround for a guest
> driver bug.
> 
>>        Opposed to this, virtual device is able to raise interrupts with rate
>>        limited by CPU speed only therefore driver has no chance to fix
>>        interrupt storm condition.
>> 
>>        Windows e1000e drivers rely on upper limit for number of interrupts
>>        per second in some cases and absence of this limit leads to infinite
>>        interrupt storms.
>> 
>>        To summarise, while usage of throttling mechanisms is a little bit
>>        different from what specification says, effective emulated device
>>        behavior is totally compliant to the real device.
>> 
>> 
>>    So Windows doesn't configure ITR (i.e. it is 0) even though it can't
>>    handle unlimited interrupts? That would be a driver bug then, and
>>    perhaps an important enough one to keep a workaround in our code. But
>>    then let's be explicit that this is a workaround for a Windows bug and
>>    not mandated by the spec.
>> 
>>    I'm not sure in what setup you produced this error, but possibly a
>>    reason why this doesn't happen with real hardware isn't the NIC itself
>>    but the backend: Communication with the host can obviously be faster
>>    than talking to a physical network (so if you were doing the latter, the
>>    rate in the VM wouldn't be limited by the CPU, but by the physical
>>    network).
>> 
>> 
>> This issue is reproduced on device disable and not related 
>> to intensive device/backend communication. One RX packet with
>> right timing is enough to trigged the problem.
>> 
>> The same issue was fixed in e1000 device some time ago as well.
> 
> Commit 9596ef7c was good in flagging it as a guest driver bug. Only a
> later series brought in the questionable spec interpretation.
> 
> Kevin

Re: [Qemu-devel] [PATCH] e1000e: Don't zero out buffer address in rx descriptor

Reply via email to