Hi Peter, > -----Original Message----- > From: Qemu-devel <qemu-devel- > bounces+thanos.makatos=nutanix....@nongnu.org> On Behalf Of Peter > Xu > Sent: 21 April 2020 11:44 > To: Paolo Bonzini <pbonz...@redhat.com> > Cc: Markus Armbruster <arm...@redhat.com>; QEMU Devel Mailing List > <qemu-devel@nongnu.org> > Subject: Re: Question on memory commit during MR finalize() > > On Tue, Apr 21, 2020 at 11:43:36AM +0200, Paolo Bonzini wrote: > > On 21/04/20 01:31, Peter Xu wrote: > > >> > > >> However, instead of memory_region_transaction_commit, > > >> memory_region_finalize probably should do something like > > >> > > >> --memory_region_transaction_depth; > > >> assert (memory_region_transaction_depth || > > >> (!memory_region_update_pending && > > >> !ioeventfd_update_pending)); > > > Ah I see; this makes sense. > > > > > > And finally I found the problem, which is indeed the bug in my own > > > tree - I forgot to remove the previous changes to flush the dirty > > > ring during mem removal (basically that's run_on_cpu() called during > > > a memory commit, that will wrongly release the BQL without being > noticed). > > > > > > Besides above assert, I'm thinking maybe we can also assert on > something like: > > > > > > !(memory_region_transaction_depth || > memory_region_update_pending || > > > ioeventfd_update_pending) > > > > > > When releasing BQL (unlock, or qemu_cond_wait() on BQL, which should > > > cover run_on_cpu()), so that we can identify misuse of BQL easier like > this. > > > > Asserting invariants around lock release are an interesting concept, > > but I'm not sure where to insert them exactly. But it would be great > > if you would like to introduce an assert_empty_memory_transaction() > > function with the assertion I quoted above. > > Let me give it a shot later today. :)
We're hitting this issue using a QEMU branch where JJ is using vfio-user as the transport for multiprocess-qemu (https://github.com/oracle/qemu/issues/9). We can reproduce it fairly reliably by migrating a virtual SPDK NVMe controller (the NVMf/vfio-user target with experimental migration support, https://review.spdk.io/gerrit/c/spdk/spdk/+/7617/14). I can provide detailed repro instructions but first I want to make sure we're not missing any patches.