On Wed, Mar 27, 2019 at 06:24:17PM +0100, Igor Mammedov wrote: > On Wed, 27 Mar 2019 23:03:58 +1100 > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > On Wed, Mar 27, 2019 at 10:38:38AM +0100, Igor Mammedov wrote: > > > On Wed, 27 Mar 2019 20:07:57 +1100 > > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > > > On Wed, Mar 27, 2019 at 09:57:45AM +0100, Igor Mammedov wrote: > > > > > On Wed, 27 Mar 2019 11:11:46 +1100 > > > > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > > > > > > > On Tue, Mar 26, 2019 at 03:08:20PM +0100, Igor Mammedov wrote: > > > > > > > On Tue, 26 Mar 2019 14:50:58 +1100 > > > > > > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > > > > > > > > > > > qemu_getrampagesize() works out the minimum host page size > > > > > > > > backing any of > > > > > > > > guest RAM. This is required in a few places, such as for > > > > > > > > POWER8 PAPR KVM > > > > > > > > guests, because limitations of the hardware virtualization mean > > > > > > > > the guest > > > > > > > > can't use pagesizes larger than the host pages backing its > > > > > > > > memory. > > > > > > > > > > > > > > > > However, it currently checks against *every* memory backend, > > > > > > > > whether or not > > > > > > > > it is actually mapped into guest memory at the moment. This is > > > > > > > > incorrect. > > > > > > > > > > > > > > > > This can cause a problem attempting to add memory to a POWER8 > > > > > > > > pseries KVM > > > > > > > > guest which is configured to allow hugepages in the guest (e.g. > > > > > > > > -machine cap-hpt-max-page-size=16m). If you attempt to add > > > > > > > > non-hugepage, > > > > > > > > you can (correctly) create a memory backend, however it > > > > > > > > (correctly) will > > > > > > > > throw an error when you attempt to map that memory into the > > > > > > > > guest by > > > > > > > > 'device_add'ing a pc-dimm. > > > > > > > > > > > > > > > > What's not correct is that if you then reset the guest a > > > > > > > > startup check > > > > > > > > against qemu_getrampagesize() will cause a fatal error because > > > > > > > > of the new > > > > > > > > memory object, even though it's not mapped into the guest. > > > > > > > I'd say that backend should be remove by mgmt app since > > > > > > > device_add failed > > > > > > > instead of leaving it to hang around. (but fatal error either not > > > > > > > a nice > > > > > > > behavior on QEMU part) > > > > > > > > > > > > I agree, but reset could be guest initiated, so even if management > > > > > > is > > > > > > doing the right thing there's a window where it could break. > > > > > > > > > > We could (log term) get rid of qemu_getrampagesize() (it's not really > > > > > good > > > > > to push machine/target specific condition into generic function) and > > > > > move > > > > > pagesize calculation closer to machine. In this case machine (spapr) > > > > > knows > > > > > exactly when and what is mapped and can enumerate NOT backends but > > > > > mapped > > > > > memory regions and/or pc-dimms (lets say we have > > > > > memory_region_page_size() > > > > > and apply whatever policy to the results. > > > > > > > > So.. it used to be in the machine specific code, but was made generic > > > > because it's used in the vfio code. Where it is ppc specific, but not > > > > keyed into the machine specific code in a way that we can really > > > > re-use it there. > > > It probably was the easiest way to hack things together, but then API gets > > > generalized and misused and then tweaked to specific machine usecase > > > and it gets only worse over time. > > > > I don't really know what you mean by that. In both the (main) ppc and > > VFIO cases the semantics we want from getrampagesize() really are the > > same: what's the smallest chunk of guest-contiguous memory we can rely > > on to also be host-contiguous. > "smallest chunk of guest-contiguous memory" is frontend abstraction while > the later "host-contiguous" is backend's one. But qemu_getrampagesize() > operates though on backend side (which doesn't have to represent guest RAM). > Shouldn't we look for guest RAM from frontend side (to be sure we are dealing > with guest RAM) > and then find out corresponding host side attributes from there?
I'm not really sure what you have in mind there. In cases where we have a specific guest address, we do try to go from that to backend information. Unfortunately in some cases we can't know a guest address beforehand, so we have to calculate for the worst case, which is what this is about. > btw: > above doesn't mean that we should block your simpler fix for 4.0. > So for this patch with David's s390 patch as precondition included > > Reviewed-by: Igor Mammedov <imamm...@redhat.com> > > > > > > > > > > > > This patch corrects the problem by adjusting > > > > > > > > find_max_supported_pagesize() > > > > > > > > (called from qemu_getrampagesize() via object_child_foreach) to > > > > > > > > exclude > > > > > > > > non-mapped memory backends. > > > > > > > I'm not sure about if it's ok do so. It depends on where from > > > > > > > qemu_getrampagesize() is called. For example s390 calls it rather > > > > > > > early > > > > > > > when initializing KVM, so there isn't anything mapped yet. > > > > > > > > > > > > Oh. Drat. > > > > > > > > > > > > > And once I replace -mem-path with hostmem backend and drop > > > > > > > qemu_mempath_getpagesize(mem_path) /which btw aren't guarantied > > > > > > > to be mapped either/ > > > > > > > this patch might lead to incorrect results for initial memory as > > > > > > > well. > > > > > > > > > > > > Uh.. I don't immediately see why. > > > > > It depends on when qemu_getrampagesize() is called with this patch. > > > > > > > > > > Maybe qemu_getrampagesize() is not a good interface anymore? > > > > > I also see it being called directly from device side hw/vfio/spapr.c, > > > > > which > > > > > in my opinion should be propagated from machine, > > > > > > > > Um.. I'm not sure what you mean by that. > > > It would be better if machine would set a page size property on vfio > > > device > > > when it's created. > > > > No, that doesn't work. The limitation doesn't originate with the > > machine, it originates with the *host*. I mean, in practice it'll > > nearly always be KVM and so a machine matching the host, but it > > doesn't actually have to be that way. > > > > It is possible to run an x86 (or ARM) guest with TCG using a VFIO > > device on a POWER host, and that would need to use the POWER VFIO > > driver. You'd need to line up a bunch of details to make it work, and > > it'd be kind of pointless, but it's possible. > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature