On Tue, Jul 9, 2024 at 10:56 AM Yong Huang <[email protected]> wrote: > > > > On Tue, Jul 9, 2024 at 10:41 AM Jason Wang <[email protected]> wrote: >> >> On Mon, Jul 8, 2024 at 1:17 PM Yong Huang <[email protected]> wrote: >> > >> > >> > >> > On Mon, Jul 8, 2024 at 11:21 AM Jason Wang <[email protected]> wrote: >> >> >> >> On Sat, Jul 6, 2024 at 4:30 AM Hyman Huang <[email protected]> wrote: >> >> > >> >> > Unexpected work by certain Windows guests equipped with the e1000 >> >> > interface can cause the network to go down and never come back up >> >> > again unless the guest's interface is reset. >> >> > >> >> > To reproduce the failure: >> >> > 1. Set up two guests with a Windows 2016 or 2019 server operating >> >> > system. >> >> >> >> I vaguely remember e1000 support for Windows has been deprecated for >> >> several years... >> >> >> >> That's why e1000e or igb is implemented in Qemu. >> >> >> >> > 2. Set up the e1000 interface for the guests. >> >> > 3. Pressurize the network slightly between two guests using the iPerf >> >> > tool. >> >> > >> >> > The network goes down after a few days (2-5days), and the issue >> >> > is the result of not adhering to the e1000 specification. Refer >> >> > to the details of the specification at the following link: >> >> > https://www.intel.com/content/dam/doc/manual/pci-pci-x-family-gbe-controllers-software-dev-manual.pdf >> >> > >> >> > Chapter 3.2.6 describe the Receive Descriptor Tail register(RDT) >> >> > as following: >> >> > This register holds a value that is an offset from the base, and >> >> > identifies the location beyond the last descriptor hardware can >> >> > process. Note that tail should still point to an area in the >> >> > descriptor ring (somewhere between RDBA and RDBA + RDLEN). >> >> > This is because tail points to the location where software writes >> >> > the first new descriptor. >> >> > >> >> > This means that if the provider—in this case, QEMU—has not yet >> >> > loaded the packet, >> >> >> >> What do you mean by "load" here? >> > >> > >> > Sorry for failing to describe the details. >> > >> > The guest driver retrieves the packet from the receive ring buffer >> > after QEMU forwards it from the tun/tap interface in the e1000 >> > emulation. >> > >> > I used "load" to express "putting packets into the receive ring buffer." >> > >> >> >> >> >> >> > RDT should never point to that place. >> >> >> >> And "that place"? >> > >> > If a descriptor in the receive ring buffer has not been filled with a >> > packet address by QEMU, the descriptor therefore doesn't have any >> > available packets. The location of the descriptor should not be referred >> > to by RDT because the location is in the range that "hardware" handles. >> > >> > "that place" means the location of the descriptor in the ring buffer >> > that QEMU hasn't set any available packets related to. >> > >> >> >> >> >> >> > When >> >> > implementing the emulation of the e1000 interface, QEMU evaluates >> >> > if the receive ring buffer is full once the RDT equals the RDH, >> >> > based on the assumption that guest drivers adhere to this >> >> > criterion strictly. >> >> > >> >> > We applied the following log patch to assist in analyzing the >> >> > issue and eventually obtained the unexpected information. >> >> > >> >> > Log patch: >> >> > ----------------------------------------------------------------- >> >> > |--- a/hw/net/e1000.c >> >> > |+++ b/hw/net/e1000.c >> >> > |@@ -836,6 +836,9 @@ e1000_set_link_status(NetClientState *nc) >> >> > | static bool e1000_has_rxbufs(E1000State *s, size_t total_size) >> >> > | { >> >> > | int bufs; >> >> > |+ DBGOUT(RX, "rxbuf_size = %u, s->mac_reg[RDLEN] = %u, >> >> > s->mac_reg[RDH] = %u, s->mac_reg[RDT] = %u\n", >> >> > |+ s->rxbuf_size, s->mac_reg[RDLEN], s->mac_reg[RDH], >> >> > s->mac_reg[RDT]); >> >> > |+ >> >> > | /* Fast-path short packets */ >> >> > | if (total_size <= s->rxbuf_size) { >> >> > | if (s->mac_reg[RDH] == s->mac_reg[RDT] && s->last_overrun) >> >> > |@@ -1022,6 +1025,9 @@ e1000_receive_iov(NetClientState *nc, const >> >> > struct iovec *iov, int iovcnt) >> >> > | s->rxbuf_min_shift) >> >> > | n |= E1000_ICS_RXDMT0; >> >> > | >> >> > |+ DBGOUT(RX, "rxbuf_size = %u, s->mac_reg[RDLEN] = %u, >> >> > s->mac_reg[RDH] = %u, s->mac_reg[RDT] = %u\n", >> >> > |+ s->rxbuf_size, s->mac_reg[RDLEN], s->mac_reg[RDH], >> >> > s->mac_reg[RDT]); >> >> > |+ >> >> > ----------------------------------------------------------------- >> >> > >> >> > The last few logs of information when the network is down: >> >> > >> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, >> >> > s->mac_reg[RDH] = 897, s->mac_reg[RDT] = 885 >> >> > <- the receive ring buffer is checked for fullness in the >> >> > e1000_has_rxbufs function, not full. >> >> > >> >> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, >> >> > s->mac_reg[RDH] = 898, s->mac_reg[RDT] = 885 >> >> > <- RDT stays the same, RDH updates to 898, and 1 descriptor >> >> > utilized after putting the packet to ring buffer. >> >> > >> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, >> >> > s->mac_reg[RDH] = 898, s->mac_reg[RDT] = 885 >> >> > <- the receive ring buffer is checked for fullness in the >> >> > e1000_has_rxbufs function, not full. >> >> > >> >> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, >> >> > s->mac_reg[RDH] = 899, s->mac_reg[RDT] = 885 >> >> > <- RDT stays the same, RDH updates to 899, and 1 descriptor >> >> > utilized after putting the packet to ring buffer. >> >> > >> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, >> >> > s->mac_reg[RDH] = 899, s->mac_reg[RDT] = 885 >> >> > <- the receive ring buffer is checked for fullness in the >> >> > e1000_has_rxbufs function, not full. >> >> > >> >> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, >> >> > s->mac_reg[RDH] = 900, s->mac_reg[RDT] = 885 >> >> > <- RDT stays the same, RDH updates to 900 , and 1 descriptor >> >> > utilized after putting the packet to ring buffer. >> >> > >> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, >> >> > s->mac_reg[RDH] = 900, s->mac_reg[RDT] = 900 >> >> > <- The ring is full, according to e1000_has_rxbufs, because >> >> > of the RDT update to 900 and equals RDH ! >> >> >> >> Just to make sure I understand this, RDT==RDH means the ring is empty I >> >> think? >> >> >> >> >> >> See commit: >> >> >> >> commit e5b8b0d4ba29fe1268ba049519a1b0cf8552a21a >> >> Author: Dmitry Fleytman <[email protected]> >> >> Date: Fri Oct 19 07:56:55 2012 +0200 >> >> >> >> e1000: drop check_rxov, always treat RX ring with RDH == RDT as empty >> >> >> >> Real HW always treats RX ring with RDH == RDT as empty. >> >> Emulation is supposed to behave the same. >> > >> > >> > Indeed, I'm confused :(, the description in the comment claims that RX >> > rings with RDH == RDT as empty, but in implementation, it treats that as >> > overrun. >> > >> > See the following 2 contexts: >> > >> > 1. e1000_can_receive: >> > static bool e1000_can_receive(NetClientState *nc) >> > { >> > E1000State *s = qemu_get_nic_opaque(nc); >> > // e1000_has_rxbufs return true means ring buffer has >> > // available descriptors to use for QEMU. >> > // false means ring buffer overrun and QEMU should queue the packet >> > // and wait for the RDT update and available descriptors can be used. >> > >> > return e1000x_rx_ready(&s->parent_obj, s->mac_reg) && >> > e1000_has_rxbufs(s, 1) && !timer_pending(s->flush_queue_timer); >> > } >> >> Well we had in e1000_has_rx_bufs >> >> if (total_size <= s->rxbuf_size) { >> return s->mac_reg[RDH] != s->mac_reg[RDT]; >> } >> >> RDT!=RDH means RX ring has available descriptors for hardware? > > > IMHO, Yes.
Just to make sure we are on the same page, so RDT!=RDH, descriptors available for hardware RDT==RDH, descriptor ring is empty for hardware That is currently what the code did. Seems nothing wrong, or anything I missed here? Thanks > >> >> Adding more people. >> >> Thanks >> > > > -- > Best regards
