Re: [RFC PATCH v2 0/5] physmem: Have flaview API check bus permission from MemTxAttrs argument

Peter Maydell Tue, 24 Aug 2021 02:50:34 -0700

On Mon, 23 Aug 2021 at 21:50, Peter Xu <pet...@redhat.com> wrote:
>
> On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote:
> > On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé <phi...@redhat.com> 
> > wrote:
> > >
> > > This series aim to kill a recent class of bug, the infamous
> > > "DMA reentrancy" issues found by Alexander while fuzzing.
> > >
> > > Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
> > >
> > > - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> > > - MEMTXPERM_UNRESTRICTED (allow list approach)
> > > - MEMTXPERM_RAM_DEVICE (example of deny list approach)
> > >
> > > If a transaction permission is not allowed (for example access
> > > to non-RAM device), we return the specific MEMTX_BUS_ERROR.
> > >
> > > Permissions are checked in after the flatview is resolved, and
> > > before the access is done, in a new function: flatview_access_allowed().
> >
> > So I'm not going to say 'no' to this, because we have a real
> > recursive-device-handling problem and I don't have a better
> > idea to hand, but the thing about this is that we end up with
> > behaviour which is not what the real hardware does. I'm not
> > aware of any DMA device which has this kind of "can only DMA
> > to/from RAM, and aborts on access to a device" behaviour...
>
> Sorry for not being familiar with the context - is there more info regarding
> the problem to fix?


So, the general problem is that we have a whole class of bugs
that look like this:

 * Device A is DMA-capable. It also has a set of memory
   mapped registers which can be used to control it.
 * Malicious guest code (or the fuzzer) programs A's DMA
   engine to do a DMA read or write to the address where
   A's own registers are mapped.
 * Typically, the MemoryRegionOps write function for the
   register block will handle the "write to start-dma
   register" by doing the DMA, ie calling address_space_write(),
   pci_dma_write(), or equivalent. Because of the target address
   the guest code has set, that will result in the memory
   subsystem calling back into the same MemoryRegionOps
   write function, recursively.
 * Our code implementing the model of device A is not at
   all expecting this re-entrancy, and might crash, access
   freed memory, or otherwise misbehave.

You can elaborate on that basic scenario, for instance with
a loop of multiple devices where you program device A to
do a DMA write to device B's registers which starts device B doing
a DMA write to A's registers. Nor is it inherently limited to
DMA -- device A could be made to assert a qemu_irq that is
connected to device B in a way that causes device B to
do something that results in code calling back into device A;
or maybe device A DMAs to a register in device B that implements
a device-reset on device A. DMA is just the easiest for guest
code to set up and has the least restrictions on how it's
connected up.

In some specific cases we have "fixed" individual instances
of this bug by putting in checks or changes to whatever device
model the fuzzer happened to find a problem with, or put in
slightly wider-scale attempts to catch this (eg commit 22dc8663d9f
which prevents re-entering a NIC device's packet-rx callback
function if the guest has set it up so that a received packet
results in DMA that triggers another received packet). But
we don't have a coherent model of how we ought to structure
device models that can avoid this problem in a general way,
and I think that until we do we're liable to keep running into
specific bugs, some of which will be (or at least be labelled
as) "security issues".

Philippe's series here tries to fix this for at least any
variety of this bug where there is a DMA access in the
loop, by forbidding DMA accesses to MMIO regions not backed
by RAM. That works, in that it breaks the loop; as I
mentioned in my other email, it does it in a way that's
not the way real h/w behaves. Unfortunately "what does real
h/w do?" is not in this situation necessarily a helpful
guide for QEMU design: I think that real hardware:
(a) often doesn't see the same kind of problem because
   the design will usually decouple the DMA engine from
   the register-access logic in a way that means it
   naturally doesn't literally lock up
(b) often won't have been designed to deal with "software
   programs a DMA-to-self" either, but the threat model
   for real hw is different, in that software has many ways
   of making the overall system crash, hang or misbehave;
   it often doesn't have the "need to allow untrusted
   software to touch this device" situation.

One could have QEMU work somewhat like (a) by mandating
that all DMA of any kind was done in separate bottom-half
routines and not directly from the register-write code.
That would probably reduce performance and be a lot of
code to restructure. It would deal also with another class
of "guest code can make QEMU hang by programming it to do
an enormous DMA all at once or by setting up an infinitely
looping chain of DMA commands" bugs, though.

I was vaguely tossing an idea around in the back of my mind
about whether you could have a flag on devices that marked
them as "this device is currently involved in IO", such that
you could then just fail the last DMA (or qemu_irq_set, or
whatever) that would complete the loop back to a device that
was already doing IO. But that would need a lot of thinking
through to figure out if it's feasible, and it's probably
a lot of code change.

thanks
-- PMM

Re: [RFC PATCH v2 0/5] physmem: Have flaview API check bus permission from MemTxAttrs argument

Reply via email to