New theory and testing a potential fix

Because the driver had a major refactor of the ring buffer, it looks
like we are experiencing a Ring Buffer Stall. These seem to be caused by
the fact that the network interface is connected via the RP1 PCIe
southbridge to the BCM2712 chip and is not hardware DMA coherent. This
means that when a descriptor is read from the ring buffer, there is a
potential risk that the data is stale due to caching. As the driver
relies on a bit that indicates if this has been processed by the
hardware, this can lead to a stall if conditions are just right. The fix
I'm trying to is to invalidate the cache before reading the descriptor.
As you can imagine, this has a performance hit, although for RPi5 it
should be negligible.

Now, take this with a grain of salt as I'm just learning about all this
and there also may be other alternatives such as using non-cacheable
memory (which sounds too slow) or using some of the 64KB of shared SRAM,
I don't know.

Saying all that, my test node has been running all night and is still
pinging now. However, with my luck, it will fail just as I post this.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2133877

Title:
  Complete network hang on Raspberry Pi 5 with kernel 6.17 under load -
  possibly related to CPU frequency scaling

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to