On Wed, 4 Sep 2019 at 03:41, Peter Xu <pet...@redhat.com> wrote:
> On Tue, Sep 03, 2019 at 05:50:56PM +0100, Peter Maydell wrote:
> > Do you have a backtrace of QEMU from the segfault? I'm having trouble
> > thinking of what the situation is when we'd try to invoke the
> > read handler on io_mem_notdirty...
>
> I've no good understanding of how PHYS_SECTION_NOTDIRTY is used
> yet... though from what I understand that's the thing this patch wants
> to fix.  Because after the broken commit, this line will be
> overwritten:
>
>     .valid.accepts = notdirty_mem_accepts,
>
> and accept() will be reset to NULL.
>
> With that, memory_region_access_valid(is_write=false) could return
> valid now (so a read could happen), while it should never, logically?

Having looked into this a bit further, I think that the problem here
is that in commit 30d7e098d5c38644359 we accidentally removed the
code for TLB_RECHECK-type TLB entries that handled the "actually this
is a RAM access" case after repeating the tlb_fill(). So instead of
read accesses to notdirty-mem going through that code and never getting
into the io_readx() function, now they do. That combined with the
bug where we made the .valid.accepts assignment stop working means
you can get into this segfault. This is quite rare because I think
only Arm M-profile CPUs and Sparc when using the 'invert endian'
page table bit (ie Solaris guests doing PCI stuff) will do this.

If we apply this patch to reinstate .valid.accepts then instead
of a segfault we'll get a guest exception; which is still not
the right behaviour.

So I think we need to:
 (1) fix the cputlb code to reinstate the "handle RAM immediately"
     codepath
 (2) either allow reads to notdirty-mem MRs (ie make them just
     read from the host backing RAM), or define them to be a QEMU
     bug and make them assert immediately the read is attempted

thanks
-- PMM

Reply via email to