There are quite a lot of issues with the patch provided. See below.
On 12 Mar 2018, at 07:07, Sylvia Else wrote:
> There's a race condition in the Linux e1000 driver, in that it enables
> receiving on the interface before it initialises the buffer size and
> addresses. The window is short, of the order of a millisecond, and probably
> does little harm with real hardware.
The buffer size, which is set via RCTL.BSIZE, has a default value of 2048.
Under no circumstances it can be zero. So I assume that you are talking about
e1000 driver enabling receiving before allocating rx buffers and assigning them
to rx descriptors. Indeed, e1000_configure allocates rx buffers after it
enables receiving via e1000_configure_rx, but there is a catch:
e1000_configure_rx sets both RDH and RDT to zeros, which should totally prevent
entering the rx descriptor processing loop. Only when an actual buffer
allocation function (like e1000_allloc_rx_buffer) advances RDT the rx
descriptor, for which the buffer has just been allocated, will be read by e1000
(hardware or emulated).
>
> Unfortunately, when in bridge networking mode, if there are any broadcast
> packets around, the VirtualBox DevE1000 module will try to send one during
> that window. The hardware specification is silent as to what the hardware
> should do in this situation, but the effect of the interaction between
> VirtualBox and the driver is to render the network non-functional. The
> dependence on broadcast packets to trigger the failure is no doubt why many
> people have no problem with it.
I am not trying to say that there is no problem, but your explanation of the
cause does not convince me. The idea of improperly initialised descriptors
getting stuck in rx descriptor cache seems very plausible to me, but I doubt
that e1000 driver initialisation sequence is to blame, hence I do not believe
the proposed fix solves the actual issue. With what version/flavour of linux
guest have you seen this issue? I am looking at
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/e1000/e1000_main.c#L1871
>
> While this is a bug in the driver, rather than VirtualBox, the latter could
> better handle the situation by discarding packets when the buffer size or
> addresses of zero rather than attempting (but ultimately failing) to deliver
> it.
As I explained earlier, u16RxBSize cannot be zero under any circumstances, so
your patch essentially introduces two things: (1) discarding rx descriptor
cache if a descriptor with zero buffer address is seen, and (2) not writing
back the processed descriptor with DD (descriptor done) bit set. Both things go
against e1000 spec. The spec explicitly allows descriptors with zero buffer
addresses, see section 3.2.5.2 “Null Descriptor Padding”, and prescribes the
proper course of action for hardware. I quote,
"Hardware stores no data in descriptors with a null data address. Software can
make use of this property to cause the first condition under receive descriptor
packing to occur early. Hardware writes back null descriptors with the DD bit
set in the status byte and all other bits unchanged.”
This is precisely what the existing code does. Another issue with the patch is
that it won’t compile without E1K_WITH_RXD_CACHE defined.
Please let me know if there are any special steps that could help me to
re-produce this issue. I’ll try to pester an Ubuntu guest by spamming
broadcasts in host’s LAN, but I vaguely remember trying it before without any
luck. Perhaps I had done it before I introduced rx cache, or it could have been
a different kind of guest, not Linux. Let me try again.
___
vbox-dev mailing list
vbox-dev@virtualbox.org
https://www.virtualbox.org/mailman/listinfo/vbox-dev