New theory and testing a potential fix Because the driver had a major refactor of the ring buffer, it looks like we are experiencing a Ring Buffer Stall. These seem to be caused by the fact that the network interface is connected via the RP1 PCIe southbridge to the BCM2712 chip and is not hardware DMA coherent. This means that when a descriptor is read from the ring buffer, there is a potential risk that the data is stale due to caching. As the driver relies on a bit that indicates if this has been processed by the hardware, this can lead to a stall if conditions are just right. The fix I'm trying to is to invalidate the cache before reading the descriptor. As you can imagine, this has a performance hit, although for RPi5 it should be negligible.
Now, take this with a grain of salt as I'm just learning about all this and there also may be other alternatives such as using non-cacheable memory (which sounds too slow) or using some of the 64KB of shared SRAM, I don't know. Saying all that, my test node has been running all night and is still pinging now. However, with my luck, it will fail just as I post this. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2133877 Title: Complete network hang on Raspberry Pi 5 with kernel 6.17 under load - possibly related to CPU frequency scaling To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
