On Mon, 23 Aug 2021 at 21:50, Peter Xu <pet...@redhat.com> wrote: > > On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote: > > On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé <phi...@redhat.com> > > wrote: > > > > > > This series aim to kill a recent class of bug, the infamous > > > "DMA reentrancy" issues found by Alexander while fuzzing. > > > > > > Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits: > > > > > > - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior) > > > - MEMTXPERM_UNRESTRICTED (allow list approach) > > > - MEMTXPERM_RAM_DEVICE (example of deny list approach) > > > > > > If a transaction permission is not allowed (for example access > > > to non-RAM device), we return the specific MEMTX_BUS_ERROR. > > > > > > Permissions are checked in after the flatview is resolved, and > > > before the access is done, in a new function: flatview_access_allowed(). > > > > So I'm not going to say 'no' to this, because we have a real > > recursive-device-handling problem and I don't have a better > > idea to hand, but the thing about this is that we end up with > > behaviour which is not what the real hardware does. I'm not > > aware of any DMA device which has this kind of "can only DMA > > to/from RAM, and aborts on access to a device" behaviour... > > Sorry for not being familiar with the context - is there more info regarding > the problem to fix?
So, the general problem is that we have a whole class of bugs that look like this: * Device A is DMA-capable. It also has a set of memory mapped registers which can be used to control it. * Malicious guest code (or the fuzzer) programs A's DMA engine to do a DMA read or write to the address where A's own registers are mapped. * Typically, the MemoryRegionOps write function for the register block will handle the "write to start-dma register" by doing the DMA, ie calling address_space_write(), pci_dma_write(), or equivalent. Because of the target address the guest code has set, that will result in the memory subsystem calling back into the same MemoryRegionOps write function, recursively. * Our code implementing the model of device A is not at all expecting this re-entrancy, and might crash, access freed memory, or otherwise misbehave. You can elaborate on that basic scenario, for instance with a loop of multiple devices where you program device A to do a DMA write to device B's registers which starts device B doing a DMA write to A's registers. Nor is it inherently limited to DMA -- device A could be made to assert a qemu_irq that is connected to device B in a way that causes device B to do something that results in code calling back into device A; or maybe device A DMAs to a register in device B that implements a device-reset on device A. DMA is just the easiest for guest code to set up and has the least restrictions on how it's connected up. In some specific cases we have "fixed" individual instances of this bug by putting in checks or changes to whatever device model the fuzzer happened to find a problem with, or put in slightly wider-scale attempts to catch this (eg commit 22dc8663d9f which prevents re-entering a NIC device's packet-rx callback function if the guest has set it up so that a received packet results in DMA that triggers another received packet). But we don't have a coherent model of how we ought to structure device models that can avoid this problem in a general way, and I think that until we do we're liable to keep running into specific bugs, some of which will be (or at least be labelled as) "security issues". Philippe's series here tries to fix this for at least any variety of this bug where there is a DMA access in the loop, by forbidding DMA accesses to MMIO regions not backed by RAM. That works, in that it breaks the loop; as I mentioned in my other email, it does it in a way that's not the way real h/w behaves. Unfortunately "what does real h/w do?" is not in this situation necessarily a helpful guide for QEMU design: I think that real hardware: (a) often doesn't see the same kind of problem because the design will usually decouple the DMA engine from the register-access logic in a way that means it naturally doesn't literally lock up (b) often won't have been designed to deal with "software programs a DMA-to-self" either, but the threat model for real hw is different, in that software has many ways of making the overall system crash, hang or misbehave; it often doesn't have the "need to allow untrusted software to touch this device" situation. One could have QEMU work somewhat like (a) by mandating that all DMA of any kind was done in separate bottom-half routines and not directly from the register-write code. That would probably reduce performance and be a lot of code to restructure. It would deal also with another class of "guest code can make QEMU hang by programming it to do an enormous DMA all at once or by setting up an infinitely looping chain of DMA commands" bugs, though. I was vaguely tossing an idea around in the back of my mind about whether you could have a flag on devices that marked them as "this device is currently involved in IO", such that you could then just fail the last DMA (or qemu_irq_set, or whatever) that would complete the loop back to a device that was already doing IO. But that would need a lot of thinking through to figure out if it's feasible, and it's probably a lot of code change. thanks -- PMM