On Wed, 27 Mar 2019 23:03:58 +1100
David Gibson <da...@gibson.dropbear.id.au> wrote:

> On Wed, Mar 27, 2019 at 10:38:38AM +0100, Igor Mammedov wrote:
> > On Wed, 27 Mar 2019 20:07:57 +1100
> > David Gibson <da...@gibson.dropbear.id.au> wrote:
> > 
> > > On Wed, Mar 27, 2019 at 09:57:45AM +0100, Igor Mammedov wrote:
> > > > On Wed, 27 Mar 2019 11:11:46 +1100
> > > > David Gibson <da...@gibson.dropbear.id.au> wrote:
> > > >   
> > > > > On Tue, Mar 26, 2019 at 03:08:20PM +0100, Igor Mammedov wrote:  
> > > > > > On Tue, 26 Mar 2019 14:50:58 +1100
> > > > > > David Gibson <da...@gibson.dropbear.id.au> wrote:
> > > > > >     
> > > > > > > qemu_getrampagesize() works out the minimum host page size 
> > > > > > > backing any of
> > > > > > > guest RAM.  This is required in a few places, such as for POWER8 
> > > > > > > PAPR KVM
> > > > > > > guests, because limitations of the hardware virtualization mean 
> > > > > > > the guest
> > > > > > > can't use pagesizes larger than the host pages backing its memory.
> > > > > > > 
> > > > > > > However, it currently checks against *every* memory backend, 
> > > > > > > whether or not
> > > > > > > it is actually mapped into guest memory at the moment.  This is 
> > > > > > > incorrect.
> > > > > > > 
> > > > > > > This can cause a problem attempting to add memory to a POWER8 
> > > > > > > pseries KVM
> > > > > > > guest which is configured to allow hugepages in the guest (e.g.
> > > > > > > -machine cap-hpt-max-page-size=16m).  If you attempt to add 
> > > > > > > non-hugepage,
> > > > > > > you can (correctly) create a memory backend, however it 
> > > > > > > (correctly) will
> > > > > > > throw an error when you attempt to map that memory into the guest 
> > > > > > > by
> > > > > > > 'device_add'ing a pc-dimm.
> > > > > > > 
> > > > > > > What's not correct is that if you then reset the guest a startup 
> > > > > > > check
> > > > > > > against qemu_getrampagesize() will cause a fatal error because of 
> > > > > > > the new
> > > > > > > memory object, even though it's not mapped into the guest.    
> > > > > > I'd say that backend should be remove by mgmt app since device_add 
> > > > > > failed
> > > > > > instead of leaving it to hang around. (but fatal error either not a 
> > > > > > nice
> > > > > > behavior on QEMU part)    
> > > > > 
> > > > > I agree, but reset could be guest initiated, so even if management is
> > > > > doing the right thing there's a window where it could break.  
> > > > 
> > > > We could (log term) get rid of qemu_getrampagesize() (it's not really 
> > > > good
> > > > to push machine/target specific condition into generic function) and 
> > > > move
> > > > pagesize calculation closer to machine. In this case machine (spapr) 
> > > > knows
> > > > exactly when and what is mapped and can enumerate NOT backends but 
> > > > mapped
> > > > memory regions and/or pc-dimms (lets say we have 
> > > > memory_region_page_size()
> > > > and apply whatever policy to the results.  
> > > 
> > > So.. it used to be in the machine specific code, but was made generic
> > > because it's used in the vfio code.  Where it is ppc specific, but not
> > > keyed into the machine specific code in a way that we can really
> > > re-use it there.
> > It probably was the easiest way to hack things together, but then API gets
> > generalized and misused and then tweaked to specific machine usecase
> > and it gets only worse over time.
> 
> I don't really know what you mean by that.  In both the (main) ppc and
> VFIO cases the semantics we want from getrampagesize() really are the
> same: what's the smallest chunk of guest-contiguous memory we can rely
> on to also be host-contiguous.
"smallest chunk of guest-contiguous memory" is frontend abstraction while
the later "host-contiguous" is backend's one. But qemu_getrampagesize()
operates though on backend side (which doesn't have to represent guest RAM).
Shouldn't we look for guest RAM from frontend side (to be sure we are dealing 
with guest RAM)
and then find out corresponding host side attributes from there?

btw:
above doesn't mean that we should block your simpler fix for 4.0.
So for this patch with David's s390 patch as precondition included

Reviewed-by: Igor Mammedov <imamm...@redhat.com>


> 
> > > > > > > This patch corrects the problem by adjusting 
> > > > > > > find_max_supported_pagesize()
> > > > > > > (called from qemu_getrampagesize() via object_child_foreach) to 
> > > > > > > exclude
> > > > > > > non-mapped memory backends.    
> > > > > > I'm not sure about if it's ok do so. It depends on where from
> > > > > > qemu_getrampagesize() is called. For example s390 calls it rather 
> > > > > > early
> > > > > > when initializing KVM, so there isn't anything mapped yet.    
> > > > > 
> > > > > Oh.  Drat.
> > > > >   
> > > > > > And once I replace -mem-path with hostmem backend and drop
> > > > > > qemu_mempath_getpagesize(mem_path) /which btw aren't guarantied to 
> > > > > > be mapped either/
> > > > > > this patch might lead to incorrect results for initial memory as
> > > > > > well.    
> > > > > 
> > > > > Uh.. I don't immediately see why.  
> > > > It depends on when qemu_getrampagesize() is called with this patch.
> > > > 
> > > > Maybe qemu_getrampagesize() is not a good interface anymore?
> > > > I also see it being called directly from device side hw/vfio/spapr.c, 
> > > > which
> > > > in my opinion should be propagated from machine,  
> > > 
> > > Um.. I'm not sure what you mean by that.
> > It would be better if machine would set a page size property on vfio device
> > when it's created.
> 
> No, that doesn't work.  The limitation doesn't originate with the
> machine, it originates with the *host*.  I mean, in practice it'll
> nearly always be KVM and so a machine matching the host, but it
> doesn't actually have to be that way.
> 
> It is possible to run an x86 (or ARM) guest with TCG using a VFIO
> device on a POWER host, and that would need to use the POWER VFIO
> driver.  You'd need to line up a bunch of details to make it work, and
> it'd be kind of pointless, but it's possible.
> 


Reply via email to