Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash
On 11/07/17 01:37 -0700, Jan Beulich wrote: > >>> On 07.11.17 at 09:23,wrote: > >> From: Jan Beulich [mailto:jbeul...@suse.com] > >> Sent: Tuesday, November 7, 2017 4:09 PM > >> >>> On 07.11.17 at 02:37, wrote: > >> >> From: Jan Beulich [mailto:jbeul...@suse.com] > >> >> Sent: Monday, November 6, 2017 5:17 PM > >> >> >>> On 03.11.17 at 09:29, wrote: > >> >> > We figured out the problem, some corner scripts triggered the error > >> >> > injection at the same page (pfn 0x180020) twice, i.e. "./xen-mceinj > >> >> > -t 0" run over one time, which resulted in Dom0 crash. > >> >> > >> >> But isn't this a valid scenario, which shouldn't result in a kernel > >> >> crash? > >> > What if > >> >> two successive #MCs occurred for the same page? > >> >> I.e. ... > >> >> > >> > > >> > Yes, it's another valid scenario, the expect result is kernel crash. > >> > >> Kernel _crash_ or rather kernel _panic_? Of course without any kernel > >> messages > >> we can't tell one from the other, but to me this makes a difference > >> nevertheless. > >> > > Exactly, Dom0 crash. > > I don't believe a crash is the expected outcome here. > This test case injects two errors to the same dom0 page. During the first injection, offline_page() is called to set PGC_broken flag of that page. During the second injection, offline_page() detects the same broken page is touched again, and then tries to shutdown the page owner, i.e. dom0 in this case: /* * NB. When broken page belong to guest, usually hypervisor will * notify the guest to handle the broken page. However, hypervisor * need to prevent malicious guest access the broken page again. * Under such case, hypervisor shutdown guest, preventing recursive mce. */ if ( (pg->count_info & PGC_broken) && (owner = page_get_owner(pg)) ) { *status = PG_OFFLINE_AGAIN; domain_shutdown(owner, SHUTDOWN_crash); return 0; } So I think Dom0 crash and the following machine reboot are the expected behaviors here. But, it looks a (unexpected) page fault happens during the reboot. Xudong, can you check whether a normal reboot on that machine triggers a page fault? > > And I didn't see any "kernel panic" message from the log -- attach the > > original log again. > > Well, as said - there _no_ kernel log message at all, and hence we > can't tell whether it's a crash or a plain panic. Iirc Xen's "Hardware > Dom0 crashed" can't distinguish the two cases. > The crash is triggered in offline_page() before Xen can inject the error to Dom0, so there is no dom0 kernel log around the crash. This can be confirmed by dumping the call trace when hwdom_shutdown(SHUTDOWN_crash) is called. Xudong, can you do this? Thanks, Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
On 11/03/17 15:40 +0800, Chao Peng wrote: > > > +/* > > + * Interface for NVDIMM management. > > + */ > > + > > +struct xen_sysctl_nvdimm_op { > > +uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented > > yet. */ > > +uint32_t pad; /* IN: Always zero. */ > > If alignment is the only concern, then err can be moved to here. > > If it's designed for future and does not get used now, then it's better > to check its value explicitly. > I'll move 'err' to the position of 'pad'. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0
On 11/03/17 14:51 +0800, Chao Peng wrote: > On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote: > > ... to avoid the inference with the PMEM driver and management > > utilities in Dom0. > > > > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> > > --- > > Cc: Jan Beulich <jbeul...@suse.com> > > Cc: Andrew Cooper <andrew.coop...@citrix.com> > > Cc: Gang Wei <gang@intel.com> > > Cc: Shane Wang <shane.w...@intel.com> > > --- > > xen/arch/x86/acpi/power.c | 7 +++ > > xen/arch/x86/dom0_build.c | 5 + > > xen/arch/x86/shutdown.c | 3 +++ > > xen/arch/x86/tboot.c | 4 > > xen/common/kexec.c| 3 +++ > > xen/common/pmem.c | 21 + > > xen/drivers/acpi/nfit.c | 21 + > > xen/include/xen/acpi.h| 2 ++ > > xen/include/xen/pmem.h| 13 + > > 9 files changed, 79 insertions(+) > > > > diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c > > index 1e4e5680a7..d135715a49 100644 > > --- a/xen/arch/x86/acpi/power.c > > +++ b/xen/arch/x86/acpi/power.c > > @@ -178,6 +178,10 @@ static int enter_state(u32 state) > > > > freeze_domains(); > > > > +#ifdef CONFIG_NVDIMM_PMEM > > +acpi_nfit_reinstate(); > > +#endif > > I don't understand why reinstate is needed for NFIT table? Will it be > searched by firmware on shutdown / entering power state? I added these acpi_nfit_reinstate()'s akin to acpi_dmar_reinstate(). There is not public documents stating NFIT is not rebuilt during power state changes. Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT
On 11/03/17 14:15 +0800, Chao Peng wrote: > > > +static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc > > *desc) > > +{ > > +struct nfit_spa_desc *spa_desc; > > +struct nfit_memdev_desc *memdev_desc; > > +struct acpi_nfit_system_address *spa; > > +unsigned long smfn, emfn; > > + > > +list_for_each_entry(memdev_desc, >memdev_list, link) > > +{ > > +spa_desc = memdev_desc->spa_desc; > > + > > +if ( !spa_desc || > > + (memdev_desc->acpi_table->flags & > > + (ACPI_NFIT_MEM_SAVE_FAILED | > > ACPI_NFIT_MEM_RESTORE_FAILED | > > + ACPI_NFIT_MEM_FLUSH_FAILED | ACPI_NFIT_MEM_NOT_ARMED | > > + ACPI_NFIT_MEM_MAP_FAILED)) ) > > +continue; > > If failure is detected, is it reasonable to continue? We can print some > messages at least I think. I got something wrong here. I should iterate SPA structures, and check all memdev in each SPA range. If any memdev contains failure flags, then skip the whole SPA range and print an error message. Haozhong > > Chao > > + > > +spa = spa_desc->acpi_table; > > +if ( memcmp(spa->range_guid, nfit_spa_pmem_guid, 16) ) > > +continue; > > +smfn = paddr_to_pfn(spa->address); > > +emfn = paddr_to_pfn(spa->address + spa->length); > > +printk(XENLOG_INFO "NFIT: PMEM MFNs 0x%lx - 0x%lx\n", smfn, > > emfn); > > +} > > +} ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable
On 11/03/17 13:58 +0800, Chao Peng wrote: > > > +#ifdef CONFIG_NVDIMM_PMEM > > +static void __init init_frametable_pmem_chunk(unsigned long s, > > unsigned long e) > > +{ > > +static unsigned long pmem_init_frametable_mfn; > > + > > +ASSERT(!((s | e) & (PAGE_SIZE - 1))); > > + > > +if ( !pmem_init_frametable_mfn ) > > +{ > > +pmem_init_frametable_mfn = alloc_boot_pages(1, 1); > > +if ( !pmem_init_frametable_mfn ) > > +panic("Not enough memory for pmem initial frame table > > page"); > > +memset(mfn_to_virt(pmem_init_frametable_mfn), -1, PAGE_SIZE); > > +} > > Can zero_page be used instead? No. I intend to make the frametable entries for NVDIMM as invalid at boot time, in order to avoid/detect accidental accesses to NVDIMM pages before they are registered to Xen hypervisor later (by part 2 patches 14 - 25). > > > + > > +while ( s < e ) > > +{ > > +/* > > + * The real frame table entries of a pmem region will be > > + * created when the pmem region is registered to hypervisor. > > + * Any write attempt to the initial entries of that pmem > > + * region implies potential hypervisor bugs. In order to make > > + * those bugs explicit, map those initial entries as read- > > only. > > + */ > > +map_pages_to_xen(s, pmem_init_frametable_mfn, 1, > > PAGE_HYPERVISOR_RO); > > +s += PAGE_SIZE; > > Don't know how much the impact of 4K mapping on boot time when pmem is > very large. Perhaps we need get such data on hardware. > Well, it will be very slow because the size of NVDIMM is usually very large (e.g. from hundreds of giga-bytes to several tera-bytes). I can make it to use huge page if possible. > Another question is do we really need to map it, e.g. can we just skip > the range here? Sadly, I cannot remind why I did this. Maybe I can just leave the frametable of NVDIMM unmapped and accidental access to them would just trigger page fault in hypervisor, which can makes bugs explicit as well. Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()
On 10/27/17 14:49 +0800, Chao Peng wrote: > On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote: > > The current check refuses the hot-plugged memory that falls in one > > unused PDX group, which should be allowed. > > Looks reasonable to me. The only thing I can think of is you can double > check if the following find_next_zero_bit/find_next_bit will still > work. The first check in mem_hotadd_check() ensures spfn < epfn, so sidx <= eidx here. Compared with the previous code, the only added case is sidx == eidx, which is what this patch intends to allow and tested. Haozhong > > Chao > > > > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> > > --- > > Cc: Jan Beulich <jbeul...@suse.com> > > Cc: Andrew Cooper <andrew.coop...@citrix.com> > > --- > > xen/arch/x86/x86_64/mm.c | 6 +- > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c > > index 11746730b4..6c5221f90c 100644 > > --- a/xen/arch/x86/x86_64/mm.c > > +++ b/xen/arch/x86/x86_64/mm.c > > @@ -1296,12 +1296,8 @@ static int mem_hotadd_check(unsigned long spfn, > > unsigned long epfn) > > return 0; > > > > /* Make sure the new range is not present now */ > > -sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1) & > > ~(PDX_GROUP_COUNT - 1)) > > -/ PDX_GROUP_COUNT; > > +sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) / > > PDX_GROUP_COUNT; > > eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) / > > PDX_GROUP_COUNT; > > -if (sidx >= eidx) > > -return 0; > > - > > s = find_next_zero_bit(pdx_group_valid, eidx, sidx); > > if ( s > eidx ) > > return 0; ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains
On 10/27/17 11:26 +0800, Chao Peng wrote: > On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote: > > Overview > > == > > > > > (RFC v2 can be found at https://lists.xen.org/archives/html/xen- > devel/2017-03/msg02401.html) > > > > Well, this RFC v3 changes and inflates a lot from previous versions. > > The primary changes are listed below, most of which are to simplify > > the first implementation and avoid additional inflation. > > > > 1. Drop the support to maintain the frametable and M2P table of PMEM > > in RAM. In the future, we may add this support back. > > I don't find any discussion in v2 about this, but I'm thinking putting > those Xen data structures in RAM sometimes is useful (e.g. when > performance is important). It's better not making hard restriction on > this. Well, this is to reduce the complexity, as you see the current patch size is already too big. In addition, the size of NVDIMM can be very large, e.g. several tera-bytes or even more, which would require a large RAM space to store its frametable and M2P (~10 MB per 1 GB) and leave fewer RAM for guest usage. > > > > > 2. Hide host NFIT and deny access to host PMEM from Dom0. In other > > words, the kernel NVDIMM driver is loaded in Dom 0 and existing > > management utilities (e.g. ndctl) do not work in Dom0 anymore. This > > is to workaround the inferences of PMEM access between Dom0 and Xen > > hypervisor. In the future, we may add a stub driver in Dom0 which > > will hold the PMEM pages being used by Xen hypervisor and/or other > > domains. > > > > 3. As there is no NVDIMM driver and management utilities in Dom0 now, > > > we cannot easily specify an area of host NVDIMM (e.g., by > /dev/pmem0) > > and manage NVDIMM in Dom0 (e.g., creating labels). Instead, we > > have to specify the exact MFNs of host PMEM pages in xl domain > > configuration files and the newly added Xen NVDIMM management > > utility xen-ndctl. > > > > If there are indeed some tasks that have to be handled by existing > > driver and management utilities, such as recovery from hardware > > failures, they have to be accomplished out of Xen environment. > > What kind of recovery can happen and does the recovery can happen at > runtime? For example, can we recover a portion of NVDIMM assigned to a > certain VM while keep other VMs still using NVDIMM? For example, evaluate ACPI _DSM (maybe vendor specific) for error recovery and/or scrubbing bad blocks, etc. > > > > > After 2. is solved in the future, we would be able to make existing > > driver and management utilities work in Dom0 again. > > Is there any reason why we can't do it now? If existing ndctl (with > additional patches) can work then we don't need introduce xen-ndctl > anymore? I think that keeps user interface clearer. The simple reason is I want to reduce the components (Xen/kernel/QEMU) touched by the first patchset (whose primary target is to implement the basic functionality, i.e. mapping host NVDIMM to guest as a virtual NVDIMM). As you said, leaving a driver (the nvdimm driver and/or a stub driver) in Dom0 would make the user interface clearer. Let's see what I can get in the next version. Thanks, Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
On 10/17/17 13:45 +0200, Paolo Bonzini wrote: > On 14/10/2017 00:46, Stefano Stabellini wrote: > > On Fri, 13 Oct 2017, Jan Beulich wrote: > > On 13.10.17 at 13:13,wrote: > >>> To Jan, Andrew, Stefano and Anthony, > >>> > >>> what do you think about allowing QEMU to build the entire guest ACPI > >>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is > >>> still there and just bypassed in this case. > >> Well, if that can be made work in a non-quirky way and without > >> loss of functionality, I'd probably be fine. I do think, however, > >> that there's a reason this is being handled in hvmloader right now. > > And not to discourage you, just as a clarification, you'll also need to > > consider backward compatibility: unless the tables are identical, I > > imagine we'll have to keep using the old tables for already installed > > virtual machines. > > I agree. Some of them are already identical, some are not but the QEMU > version should be okay, and for yet more it's probably better to keep > the Xen-specific parts in hvmloader. > > The good thing is that it's possible to proceed incrementally once you > have the hvmloader support for merging the QEMU and hvmloader RSDT or > XSDT (whatever you are using), starting with just NVDIMM and proceeding > later with whatever you see fit. > I'll have a try to check how much the differences would affect. If it would not take too much work, I'd like to adapt Xen NVDIMM enabling patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo and MST's suggestions. Thanks, Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
On 10/13/17 10:44 +0200, Igor Mammedov wrote: > On Fri, 13 Oct 2017 15:53:26 +0800 > Haozhong Zhang <haozhong.zh...@intel.com> wrote: > > > On 10/12/17 17:45 +0200, Paolo Bonzini wrote: > > > On 12/10/2017 14:45, Haozhong Zhang wrote: > > > > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and > > > > /rom@etc/table-loader. The former is unstructured to guest, and > > > > contains all data of guest ACPI. The latter is a BIOSLinkerLoader > > > > organized as a set of commands, which direct the guest (e.g., SeaBIOS > > > > on KVM/QEMU) to relocate data in the former file, recalculate checksum > > > > of specified area, and fill guest address in specified ACPI field. > > > > > > > > One part of my patches is to implement a mechanism to tell Xen which > > > > part of ACPI data is a table (NFIT), and which part defines a > > > > namespace device and what the device name is. I can add two new loader > > > > commands for them respectively. > > > > > > > > Because they just provide information and SeaBIOS in non-xen > > > > environment ignores unrecognized commands, they will not break SeaBIOS > > > > in non-xen environment. > > > > > > > > On QEMU side, most Xen-specific hacks in ACPI builder could be > > > > dropped, and replaced by adding the new loader commands (though they > > > > may be used only by Xen). > > > > > > > > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor > > > > are needed in, perhaps, hvmloader. > > > > > > If Xen has to parse BIOSLinkerLoader, it can use the existing commands > > > to process a reduced set of ACPI tables. In other words, > > > etc/acpi/tables would only include the NFIT, the SSDT with namespace > > > devices, and the XSDT. etc/acpi/rsdp would include the RSDP table as > > > usual. > > > > > > hvmloader can then: > > > > > > 1) allocate some memory for where the XSDT will go > > > > > > 2) process the BIOSLinkerLoader like SeaBIOS would do > > > > > > 3) find the RSDP in low memory, since the loader script must have placed > > > it there. If it cannot find it, allocate some low memory, fill it with > > > the RSDP header and revision, and and jump to step 6 > > > > > > 4) If it found QEMU's RSDP, use it to find QEMU's XSDT > > > > > > 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT. > > > > > > 6) build hvmloader tables and link them into the RSDT and/or XSDT as > > > usual. > > > > > > 7) overwrite the RSDP in low memory with a pointer to hvmloader's own > > > RSDT and/or XSDT, and updated the checksums > > > > > > QEMU's XSDT remains there somewhere in memory, unused but harmless. > > > > +1 to Paolo's suggestion, i.e. > 1. add BIOSLinkerLoader into hvmloader > 2. load/process QEMU's tables with #1 > 3. get pointers to QEMU generated NFIT and NVDIMM SSDT from QEMU's RSDT/XSDT > and put them in hvmloader's RSDT > > > It can work for plan tables which do not contain AML. > > > > However, for a namespace device, Xen needs to know its name in order > > to detect the potential name conflict with those used in Xen built > > ACPI. Xen does not (and is not going to) introduce an AML parser, so > > it cannot get those device names from QEMU built ACPI by its own. > > > > The idea of either this patch series or the new BIOSLinkerLoader > > command is to let QEMU tell Xen where the definition body of a > > namespace device (i.e. that part within the outmost "Device(NAME)") is > > and what the device name is. Xen, after the name conflict check, can > > re-package the definition body in a namespace device (w/ minimal AML > > builder code added in Xen) and then in SSDT. > > I'd skip conflict check at runtime as hvmloader doesn't currently > have "\\_SB\NVDR" device so instead of doing runtime check it might > do primitive check at build time that ASL sources in hvmloader do > not contain reserved for QEMU "NVDR" keyword to avoid its addition > by accident in future. (it also might be reused in future if some > other tables from QEMU will be reused). > It's a bit hackinsh but at least it does the job and keeps > BIOSLinkerLoader interface the same for all supported firmwares > (I'd consider it as a temporary hack on the way to fully build > by QEMU ACPI tables for Xen). > > Ideally it wo
Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
On 10/12/17 13:39 -0400, Konrad Rzeszutek Wilk wrote: > On Thu, Oct 12, 2017 at 08:45:44PM +0800, Haozhong Zhang wrote: > > On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote: > > > On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote: > > > > On 09/11/17 11:52 -0700, Stefano Stabellini wrote: > > > > > CC'ing xen-devel, and the Xen tools and x86 maintainers. > > > > > > > > > > On Mon, 11 Sep 2017, Igor Mammedov wrote: > > > > > > On Mon, 11 Sep 2017 12:41:47 +0800 > > > > > > Haozhong Zhang <haozhong.zh...@intel.com> wrote: > > > > > > > > > > > > > This is the QEMU part patches that works with the associated Xen > > > > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies > > > > > > > on > > > > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and > > > > > > > allocate > > > > > > > guest address space for vNVDIMM devices. > > > > > > > > > > > > > > All patches can be found at > > > > > > > Xen: https://github.com/hzzhan9/xen.git nvdimm-rfc-v3 > > > > > > > QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3 > > > > > > > > > > > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing > > > > > > > label data, as the Xen side support for labels is not implemented > > > > > > > yet. > > > > > > > > > > > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a > > > > > > > hotplug > > > > > > > memory region for Xen guest, in order to make the existing nvdimm > > > > > > > device plugging path work on Xen. > > > > > > > > > > > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when > > > > > > > QEMU is > > > > > > > used as the Xen device model. > > > > > > > > > > > > I've skimmed over patch-set and can't say that I'm happy with > > > > > > number of xen_enabled() invariants it introduced as well as > > > > > > with partial blobs it creates. > > > > > > > > > > I have not read the series (Haozhong, please CC me, Anthony and > > > > > xen-devel to the whole series next time), but yes, indeed. Let's not > > > > > add > > > > > more xen_enabled() if possible. > > > > > > > > > > Haozhong, was there a design document thread on xen-devel about this? > > > > > If > > > > > so, did it reach a conclusion? Was the design accepted? If so, please > > > > > add a link to the design doc in the introductory email, so that > > > > > everybody can read it and be on the same page. > > > > > > > > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed > > > > the guest ACPI. > > > > > > > > [1] > > > > https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html > > > > > > Igor, did you have a chance to read it? > > > > > > .. see below > > > > > > > > > > > > > > > > > > > > I'd like to reduce above and a way to do this might be making xen > > > > > > 1. use fw_cfg > > > > > > 2. fetch QEMU build acpi tables from fw_cfg > > > > > > 3. extract nvdim tables (which is trivial) and use them > > > > > > > > > > > > looking at xen_load_linux(), it seems possible to use fw_cfg. > > > > > > > > > > > > So what's stopping xen from using it elsewhere?, > > > > > > instead of adding more xen specific code to do 'the same' > > > > > > job and not reusing/sharing common code with tcg/kvm. > > > > > > > > > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines > > > > > rely on a firmware-like application called "hvmloader" that runs in > > > > > guest context and generates the ACPI tables. I have no opinions on > > > > > hvmloader and I'll let the Xen maintainers talk about it. However, > > > > > keep > > > > > in mind that with an HVM guest some devices are emulated
Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
On 10/12/17 17:45 +0200, Paolo Bonzini wrote: > On 12/10/2017 14:45, Haozhong Zhang wrote: > > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and > > /rom@etc/table-loader. The former is unstructured to guest, and > > contains all data of guest ACPI. The latter is a BIOSLinkerLoader > > organized as a set of commands, which direct the guest (e.g., SeaBIOS > > on KVM/QEMU) to relocate data in the former file, recalculate checksum > > of specified area, and fill guest address in specified ACPI field. > > > > One part of my patches is to implement a mechanism to tell Xen which > > part of ACPI data is a table (NFIT), and which part defines a > > namespace device and what the device name is. I can add two new loader > > commands for them respectively. > > > > Because they just provide information and SeaBIOS in non-xen > > environment ignores unrecognized commands, they will not break SeaBIOS > > in non-xen environment. > > > > On QEMU side, most Xen-specific hacks in ACPI builder could be > > dropped, and replaced by adding the new loader commands (though they > > may be used only by Xen). > > > > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor > > are needed in, perhaps, hvmloader. > > If Xen has to parse BIOSLinkerLoader, it can use the existing commands > to process a reduced set of ACPI tables. In other words, > etc/acpi/tables would only include the NFIT, the SSDT with namespace > devices, and the XSDT. etc/acpi/rsdp would include the RSDP table as usual. > > hvmloader can then: > > 1) allocate some memory for where the XSDT will go > > 2) process the BIOSLinkerLoader like SeaBIOS would do > > 3) find the RSDP in low memory, since the loader script must have placed > it there. If it cannot find it, allocate some low memory, fill it with > the RSDP header and revision, and and jump to step 6 > > 4) If it found QEMU's RSDP, use it to find QEMU's XSDT > > 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT. > > 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual. > > 7) overwrite the RSDP in low memory with a pointer to hvmloader's own > RSDT and/or XSDT, and updated the checksums > > QEMU's XSDT remains there somewhere in memory, unused but harmless. > It can work for plan tables which do not contain AML. However, for a namespace device, Xen needs to know its name in order to detect the potential name conflict with those used in Xen built ACPI. Xen does not (and is not going to) introduce an AML parser, so it cannot get those device names from QEMU built ACPI by its own. The idea of either this patch series or the new BIOSLinkerLoader command is to let QEMU tell Xen where the definition body of a namespace device (i.e. that part within the outmost "Device(NAME)") is and what the device name is. Xen, after the name conflict check, can re-package the definition body in a namespace device (w/ minimal AML builder code added in Xen) and then in SSDT. Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote: > > On 09/11/17 11:52 -0700, Stefano Stabellini wrote: > > > CC'ing xen-devel, and the Xen tools and x86 maintainers. > > > > > > On Mon, 11 Sep 2017, Igor Mammedov wrote: > > > > On Mon, 11 Sep 2017 12:41:47 +0800 > > > > Haozhong Zhang <haozhong.zh...@intel.com> wrote: > > > > > > > > > This is the QEMU part patches that works with the associated Xen > > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on > > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate > > > > > guest address space for vNVDIMM devices. > > > > > > > > > > All patches can be found at > > > > > Xen: https://github.com/hzzhan9/xen.git nvdimm-rfc-v3 > > > > > QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3 > > > > > > > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing > > > > > label data, as the Xen side support for labels is not implemented yet. > > > > > > > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug > > > > > memory region for Xen guest, in order to make the existing nvdimm > > > > > device plugging path work on Xen. > > > > > > > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is > > > > > used as the Xen device model. > > > > > > > > I've skimmed over patch-set and can't say that I'm happy with > > > > number of xen_enabled() invariants it introduced as well as > > > > with partial blobs it creates. > > > > > > I have not read the series (Haozhong, please CC me, Anthony and > > > xen-devel to the whole series next time), but yes, indeed. Let's not add > > > more xen_enabled() if possible. > > > > > > Haozhong, was there a design document thread on xen-devel about this? If > > > so, did it reach a conclusion? Was the design accepted? If so, please > > > add a link to the design doc in the introductory email, so that > > > everybody can read it and be on the same page. > > > > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed > > the guest ACPI. > > > > [1] > > https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html > > Igor, did you have a chance to read it? > > .. see below > > > > > > > > > > > > I'd like to reduce above and a way to do this might be making xen > > > > 1. use fw_cfg > > > > 2. fetch QEMU build acpi tables from fw_cfg > > > > 3. extract nvdim tables (which is trivial) and use them > > > > > > > > looking at xen_load_linux(), it seems possible to use fw_cfg. > > > > > > > > So what's stopping xen from using it elsewhere?, > > > > instead of adding more xen specific code to do 'the same' > > > > job and not reusing/sharing common code with tcg/kvm. > > > > > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines > > > rely on a firmware-like application called "hvmloader" that runs in > > > guest context and generates the ACPI tables. I have no opinions on > > > hvmloader and I'll let the Xen maintainers talk about it. However, keep > > > in mind that with an HVM guest some devices are emulated by Xen and/or > > > by other device emulators that can run alongside QEMU. QEMU doesn't have > > > a full few of the system. > > > > > > Here the question is: does it have to be QEMU the one to generate the > > > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader > > > like the rest, instead of introducing this split-brain design about > > > ACPI. We need to see a design doc to fully understand this. > > > > > > > hvmloader runs in the guest and is responsible to build/load guest > > ACPI. However, it's not capable to build AML at runtime (for the lack > > of AML builder). If any guest ACPI object is needed (e.g. by guest > > DSDT), it has to be generated from ASL by iasl at Xen compile time and > > then be loaded by hvmloader at runtime. > > > > Xen includes an OperationRegion "BIOS" in the static generated guest > > DSDT, whose address is hardcoded and which contains a list of values > > fi
[Xen-devel] [PATCH v2] VT-d: use two 32-bit writes to update DMAR fault address registers
The 64-bit DMAR fault address is composed of two 32 bits registers DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec: "Software is expected to access 32-bit registers as aligned doublewords", a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and DMAR_FEUADDR_REG separately in order to update a 64-bit fault address, rather than a 64-bit write to DMAR_FEADDR_REG. Note that when x2APIC is not enabled DMAR_FEUADDR_REG is reserved and it's not necessary to update it. Though I haven't seen any errors caused by such one 64-bit write on real machines, it's still better to follow the specification. Fixes: ae05fd3912b ("VT-d: use qword MMIO access for MSI address writes") Reviewed-by: Roger Pau Monné <roger@citrix.com> Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Changes in v2: * Explain in commit message and code comment why not updating DMAR_FEUADDR_REG when x2APIC is not enabled This patch actually reverts part of commit ae05fd3912b ("VT-d: use qword MMIO access for MSI address writes"). The latter was included in XSA-120, 128..131 follow-up patch series [1]. I don't know whether my patch breaks those XSA fixes. If it does, please drop my patch. [1] https://lists.xenproject.org/archives/html/xen-devel/2015-06/msg00638.html --- xen/drivers/passthrough/vtd/iommu.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index daaed0abbd..81dd2085c7 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1105,7 +1105,13 @@ static void dma_msi_set_affinity(struct irq_desc *desc, const cpumask_t *mask) spin_lock_irqsave(>register_lock, flags); dmar_writel(iommu->reg, DMAR_FEDATA_REG, msg.data); -dmar_writeq(iommu->reg, DMAR_FEADDR_REG, msg.address); +dmar_writel(iommu->reg, DMAR_FEADDR_REG, msg.address_lo); +/* + * When x2APIC is not enabled, DMAR_FEUADDR_REG is reserved and + * it's not necessary to update it. + */ +if (x2apic_enabled) +dmar_writel(iommu->reg, DMAR_FEUADDR_REG, msg.address_hi); spin_unlock_irqrestore(>register_lock, flags); } -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers
On 09/18/17 02:30 -0600, Jan Beulich wrote: > >>> On 18.09.17 at 10:18,wrote: > >> From: Jan Beulich [mailto:jbeul...@suse.com] > >> Sent: Monday, September 11, 2017 6:03 PM > >> > >> >>> On 11.09.17 at 08:00, wrote: > >> > The 64-bit DMAR fault address is composed of two 32 bits registers > >> > DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec: > >> > "Software is expected to access 32-bit registers as aligned doublewords", > >> > a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and > >> > DMAR_FEUADDR_REG separately in order to update a 64-bit fault > >> address, > >> > rather than a 64-bit write to DMAR_FEADDR_REG. > >> > > >> > Though I haven't seen any errors caused by such one 64-bit write on > >> > real machines, it's still better to follow the specification. > >> > >> Any sane chipset should split qword accesses into dword ones if > >> they can't be handled at some layer. Also if you undo something > >> explicitly done by an earlier commit, please quote that commit > >> and say what was wrong. After all Kevin as the VT-d maintainer > >> agreed with the change back then. > > > > I'm OK with this change. > > Hmm, would you mind explaining? You were also okay with the > change in the opposite direction back then, and we've had no > reports of problems. > I haven't seen any issues of the current 64-bit write on recent Intel Haswell, Broadwell and Skylake Xeon platforms, so I guess the hardware can properly handle the 64-bits write to contiguous 32-bit registers. I actually encountered errors when running Xen on KVM/QEMU with QEMU vIOMMU enabled, which (QEMU) disallows 64-bit writes to 32-bit registers and aborts if such writes happen. If this patch is considered senseless (as it does not fix any errors on real hardware), I'm fine to fix the above abort on QEMU side (i.e., let vIOMMU in QEMU follow the behavior of real hardware). Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
On 09/11/17 11:52 -0700, Stefano Stabellini wrote: > CC'ing xen-devel, and the Xen tools and x86 maintainers. > > On Mon, 11 Sep 2017, Igor Mammedov wrote: > > On Mon, 11 Sep 2017 12:41:47 +0800 > > Haozhong Zhang <haozhong.zh...@intel.com> wrote: > > > > > This is the QEMU part patches that works with the associated Xen > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate > > > guest address space for vNVDIMM devices. > > > > > > All patches can be found at > > > Xen: https://github.com/hzzhan9/xen.git nvdimm-rfc-v3 > > > QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3 > > > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing > > > label data, as the Xen side support for labels is not implemented yet. > > > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug > > > memory region for Xen guest, in order to make the existing nvdimm > > > device plugging path work on Xen. > > > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is > > > used as the Xen device model. > > > > I've skimmed over patch-set and can't say that I'm happy with > > number of xen_enabled() invariants it introduced as well as > > with partial blobs it creates. > > I have not read the series (Haozhong, please CC me, Anthony and > xen-devel to the whole series next time), but yes, indeed. Let's not add > more xen_enabled() if possible. > > Haozhong, was there a design document thread on xen-devel about this? If > so, did it reach a conclusion? Was the design accepted? If so, please > add a link to the design doc in the introductory email, so that > everybody can read it and be on the same page. Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed the guest ACPI. [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html > > > > I'd like to reduce above and a way to do this might be making xen > > 1. use fw_cfg > > 2. fetch QEMU build acpi tables from fw_cfg > > 3. extract nvdim tables (which is trivial) and use them > > > > looking at xen_load_linux(), it seems possible to use fw_cfg. > > > > So what's stopping xen from using it elsewhere?, > > instead of adding more xen specific code to do 'the same' > > job and not reusing/sharing common code with tcg/kvm. > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines > rely on a firmware-like application called "hvmloader" that runs in > guest context and generates the ACPI tables. I have no opinions on > hvmloader and I'll let the Xen maintainers talk about it. However, keep > in mind that with an HVM guest some devices are emulated by Xen and/or > by other device emulators that can run alongside QEMU. QEMU doesn't have > a full few of the system. > > Here the question is: does it have to be QEMU the one to generate the > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader > like the rest, instead of introducing this split-brain design about > ACPI. We need to see a design doc to fully understand this. > hvmloader runs in the guest and is responsible to build/load guest ACPI. However, it's not capable to build AML at runtime (for the lack of AML builder). If any guest ACPI object is needed (e.g. by guest DSDT), it has to be generated from ASL by iasl at Xen compile time and then be loaded by hvmloader at runtime. Xen includes an OperationRegion "BIOS" in the static generated guest DSDT, whose address is hardcoded and which contains a list of values filled by hvmloader at runtime. Other ACPI objects can refer to those values (e.g., the number of vCPUs). But it's not enough for generating guest NVDIMM ACPI objects at compile time and then being customized and loaded by hvmload, because its structure (i.e., the number of namespace devices) cannot be decided util the guest config is known. Alternatively, we may introduce an AML builder in hvmloader and build all guest ACPI completely in hvmloader. Looking at the similar implementation in QEMU, it would not be small, compared to the current size of hvmloader. Besides, I'm still going to let QEMU handle guest NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to build NVDIMM ACPI. > If the design doc thread led into thinking that it has to be QEMU to > generate them, then would it make the code nicer if we used fw_cfg to > get the (full or partial) tables from QEMU, as Igor suggested? I'll have a look at the code (which I didn't notice) pointed by Igor. One possible issue t
Re: [Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers
On 09/11/17 10:38 +0100, Roger Pau Monné wrote: > On Mon, Sep 11, 2017 at 02:00:48PM +0800, Haozhong Zhang wrote: > > The 64-bit DMAR fault address is composed of two 32 bits registers > > DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec: > > "Software is expected to access 32-bit registers as aligned doublewords", > > a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and > > DMAR_FEUADDR_REG separately in order to update a 64-bit fault address, > > rather than a 64-bit write to DMAR_FEADDR_REG. > > > > Though I haven't seen any errors caused by such one 64-bit write on > > real machines, it's still better to follow the specification. > > Either the patch description is missing something or the patch is > wrong. You should mention why is the write to the high part of the > address now conditional on x2APIC being enabled, when it didn't use to > be before. > When x2APIC is disabled, DMAR_FEUADDR_REG is reserved and it's not necessary to update it. The original code always writes zero to it in that case, which is also correct. Haozhong > [...] > > -dmar_writeq(iommu->reg, DMAR_FEADDR_REG, msg.address); > > +dmar_writel(iommu->reg, DMAR_FEADDR_REG, msg.address_lo); > > +if (x2apic_enabled) > > +dmar_writel(iommu->reg, DMAR_FEUADDR_REG, msg.address_hi); > > spin_unlock_irqrestore(>register_lock, flags); > > Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 6/6] x86/mce: remove extra blanks in mctelem.c
The entire file of mctelem.c is in Linux coding style, so do not change the coding style and only remove trailing spaces and extra blank lines. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- xen/arch/x86/cpu/mcheck/mctelem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c index b63e559d4d..492e2af77f 100644 --- a/xen/arch/x86/cpu/mcheck/mctelem.c +++ b/xen/arch/x86/cpu/mcheck/mctelem.c @@ -220,7 +220,7 @@ void mctelem_process_deferred(unsigned int cpu, int ret; /* -* First, unhook the list of telemetry structures, and +* First, unhook the list of telemetry structures, and * hook it up to the processing list head for this CPU. * * If @lmce is true and a non-local MC# occurs before the @@ -339,7 +339,7 @@ void __init mctelem_init(unsigned int datasz) { char *datarr; unsigned int i; - + BUILD_BUG_ON(MC_URGENT != 0 || MC_NONURGENT != 1 || MC_NCLASSES != 2); datasz = (datasz & ~0xf) + 0x10;/* 16 byte roundup */ -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 5/6] x86/mce: add emacs block to mctelem.c
mctelem.c uses the tab indention. Add an emacs block to avoid mixed indention styles in certain editors. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- xen/arch/x86/cpu/mcheck/mctelem.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c index b144a66053..b63e559d4d 100644 --- a/xen/arch/x86/cpu/mcheck/mctelem.c +++ b/xen/arch/x86/cpu/mcheck/mctelem.c @@ -550,3 +550,13 @@ void mctelem_ack(mctelem_class_t which, mctelem_cookie_t cookie) wmb(); spin_unlock(_lock); } + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * indent-tabs-mode: t + * tab-width: 8 + * End: + */ -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/6] x86/vmce: adapt vmce.c to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- xen/arch/x86/cpu/mcheck/vmce.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c index 9c460c7c6c..e07cd2feef 100644 --- a/xen/arch/x86/cpu/mcheck/vmce.c +++ b/xen/arch/x86/cpu/mcheck/vmce.c @@ -185,7 +185,7 @@ int vmce_rdmsr(uint32_t msr, uint64_t *val) { case MSR_IA32_MCG_STATUS: *val = cur->arch.vmce.mcg_status; -if (*val) +if ( *val ) mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_STATUS %#"PRIx64"\n", cur, *val); break; @@ -354,7 +354,8 @@ static int vmce_save_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h) struct vcpu *v; int err = 0; -for_each_vcpu( d, v ) { +for_each_vcpu ( d, v ) +{ struct hvm_vmce_vcpu ctxt = { .caps = v->arch.vmce.mcg_cap, .mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2, -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/6] x86/mce: adapt mce_intel.c to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- xen/arch/x86/cpu/mcheck/mce_intel.c | 262 +++- 1 file changed, 142 insertions(+), 120 deletions(-) diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c index 4c001b407f..e5dd956a24 100644 --- a/xen/arch/x86/cpu/mcheck/mce_intel.c +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c @@ -7,7 +7,7 @@ #include #include #include -#include +#include #include #include #include @@ -64,7 +64,7 @@ static void intel_thermal_interrupt(struct cpu_user_regs *regs) ack_APIC_irq(); -if (NOW() < per_cpu(next, cpu)) +if ( NOW() < per_cpu(next, cpu) ) return; per_cpu(next, cpu) = NOW() + MILLISECS(5000); @@ -78,17 +78,16 @@ static void intel_thermal_interrupt(struct cpu_user_regs *regs) printk(KERN_EMERG "CPU%u: Temperature above threshold\n", cpu); printk(KERN_EMERG "CPU%u: Running in modulated clock mode\n", cpu); add_taint(TAINT_MACHINE_CHECK); -} else { +} else printk(KERN_INFO "CPU%u: Temperature/speed normal\n", cpu); -} } /* Thermal monitoring depends on APIC, ACPI and clock modulation */ static bool intel_thermal_supported(struct cpuinfo_x86 *c) { -if (!cpu_has_apic) +if ( !cpu_has_apic ) return false; -if (!cpu_has(c, X86_FEATURE_ACPI) || !cpu_has(c, X86_FEATURE_TM1)) +if ( !cpu_has(c, X86_FEATURE_ACPI) || !cpu_has(c, X86_FEATURE_TM1) ) return false; return true; } @@ -102,7 +101,7 @@ static void __init mcheck_intel_therm_init(void) * LVT value on BSP and use that value to restore APs' thermal LVT * entry BIOS programmed later */ -if (intel_thermal_supported(_cpu_data)) +if ( intel_thermal_supported(_cpu_data) ) lvtthmr_init = apic_read(APIC_LVTTHMR); } @@ -115,7 +114,7 @@ static void intel_init_thermal(struct cpuinfo_x86 *c) unsigned int cpu = smp_processor_id(); static uint8_t thermal_apic_vector; -if (!intel_thermal_supported(c)) +if ( !intel_thermal_supported(c) ) return; /* -ENODEV */ /* first check if its enabled already, in which case there might @@ -134,23 +133,25 @@ static void intel_init_thermal(struct cpuinfo_x86 *c) * BIOS has programmed on AP based on BSP's info we saved (since BIOS * is required to set the same value for all threads/cores). */ -if ((val & APIC_MODE_MASK) != APIC_DM_FIXED -|| (val & APIC_VECTOR_MASK) > 0xf) +if ( (val & APIC_MODE_MASK) != APIC_DM_FIXED + || (val & APIC_VECTOR_MASK) > 0xf ) apic_write(APIC_LVTTHMR, val); -if ((msr_content & (1ULL<<3)) -&& (val & APIC_MODE_MASK) == APIC_DM_SMI) { -if (c == _cpu_data) +if ( (msr_content & (1ULL<<3)) + && (val & APIC_MODE_MASK) == APIC_DM_SMI ) +{ +if ( c == _cpu_data ) printk(KERN_DEBUG "Thermal monitoring handled by SMI\n"); return; /* -EBUSY */ } -if (cpu_has(c, X86_FEATURE_TM2) && (msr_content & (1ULL << 13))) +if ( cpu_has(c, X86_FEATURE_TM2) && (msr_content & (1ULL << 13)) ) tm2 = 1; /* check whether a vector already exists, temporarily masked? */ -if (val & APIC_VECTOR_MASK) { -if (c == _cpu_data) +if ( val & APIC_VECTOR_MASK ) +{ +if ( c == _cpu_data ) printk(KERN_DEBUG "Thermal LVT vector (%#x) already installed\n", val & APIC_VECTOR_MASK); return; /* -EBUSY */ @@ -170,9 +171,9 @@ static void intel_init_thermal(struct cpuinfo_x86 *c) wrmsrl(MSR_IA32_MISC_ENABLE, msr_content | (1ULL<<3)); apic_write(APIC_LVTTHMR, val & ~APIC_LVT_MASKED); -if (opt_cpu_info) +if ( opt_cpu_info ) printk(KERN_INFO "CPU%u: Thermal monitoring enabled (%s)\n", -cpu, tm2 ? "TM2" : "TM1"); + cpu, tm2 ? "TM2" : "TM1"); return; } #endif /* CONFIG_X86_MCE_THERMAL */ @@ -181,7 +182,8 @@ static void intel_init_thermal(struct cpuinfo_x86 *c) static inline void intel_get_extended_msr(struct mcinfo_extended *ext, u32 msr) { if ( ext->mc_msrs < ARRAY_SIZE(ext->mc_msr) - && msr < MSR_IA32_MCG_EAX + nr_intel_ext_msrs ) { + && msr < MSR_IA32_MCG_EAX + nr_intel_ext_msrs ) +{ ext->mc_msr[ext->mc_msrs].reg = msr; rdmsrl(msr, ext->mc_msr[ext->mc_msrs].value); ++ext->mc_msrs; @@ -199,21 +201,21 @@ intel_get_extended_msrs(struct mcinfo_global *mig, struct mc_info *mi) * According to spec, processor _support_ 64 bit will always * have MSR beyond IA32_MCG_MISC */ -if (!mi|| !mig || nr_intel_ext_msrs == 0 ||
[Xen-devel] [PATCH 0/6] mce: fix coding style
Some files in xen/arch/x86/cpu/mcheck use mixed coding styles. Unify them to Xen hypervisor coding style. For mctelem.c which is entirely in one coding style, only remove extra blanks. No functional change is introduced. Haozhong Zhang (6): x86/mce: adapt mce.{c,h} to Xen hypervisor coding style x86/vmce: adapt vmce.c to Xen hypervisor coding style x86/mce: adapt mcation.c to Xen hypervisor coding style x86/mce: adapt mce_intel.c to Xen hypervisor coding style x86/mce: add emacs block to mctelem.c x86/mce: remove trailing spaces in mctelem.c xen/arch/x86/cpu/mcheck/mcaction.c | 74 ++--- xen/arch/x86/cpu/mcheck/mce.c | 536 xen/arch/x86/cpu/mcheck/mce.h | 21 +- xen/arch/x86/cpu/mcheck/mce_intel.c | 262 ++ xen/arch/x86/cpu/mcheck/mctelem.c | 14 +- xen/arch/x86/cpu/mcheck/vmce.c | 5 +- 6 files changed, 509 insertions(+), 403 deletions(-) -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/6] x86/mce: adapt mce.{c, h} to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- xen/arch/x86/cpu/mcheck/mce.c | 536 +++--- xen/arch/x86/cpu/mcheck/mce.h | 21 +- 2 files changed, 311 insertions(+), 246 deletions(-) diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c index 7affe2591e..580e68d6f2 100644 --- a/xen/arch/x86/cpu/mcheck/mce.c +++ b/xen/arch/x86/cpu/mcheck/mce.c @@ -64,7 +64,7 @@ struct mca_banks *mca_allbanks; int mce_verbosity; static int __init mce_set_verbosity(const char *str) { -if (strcmp("verbose", str) == 0) +if ( strcmp("verbose", str) == 0 ) mce_verbosity = MCE_VERBOSE; else return -EINVAL; @@ -81,7 +81,6 @@ static void unexpected_machine_check(const struct cpu_user_regs *regs) fatal_trap(regs, 1); } - static x86_mce_vector_t _machine_check_vector = unexpected_machine_check; void x86_mce_vector_register(x86_mce_vector_t hdlr) @@ -97,11 +96,13 @@ void do_machine_check(const struct cpu_user_regs *regs) _machine_check_vector(regs); } -/* Init machine check callback handler +/* + * Init machine check callback handler * It is used to collect additional information provided by newer * CPU families/models without the need to duplicate the whole handler. * This avoids having many handlers doing almost nearly the same and each - * with its own tweaks ands bugs. */ + * with its own tweaks ands bugs. + */ static x86_mce_callback_t mc_callback_bank_extended = NULL; void x86_mce_callback_register(x86_mce_callback_t cbfunc) @@ -109,7 +110,8 @@ void x86_mce_callback_register(x86_mce_callback_t cbfunc) mc_callback_bank_extended = cbfunc; } -/* Machine check recoverable judgement callback handler +/* + * Machine check recoverable judgement callback handler * It is used to judge whether an UC error is recoverable by software */ static mce_recoverable_t mc_recoverable_scan = NULL; @@ -124,12 +126,12 @@ struct mca_banks *mcabanks_alloc(void) struct mca_banks *mb; mb = xmalloc(struct mca_banks); -if (!mb) +if ( !mb ) return NULL; mb->bank_map = xzalloc_array(unsigned long, BITS_TO_LONGS(nr_mce_banks)); -if (!mb->bank_map) +if ( !mb->bank_map ) { xfree(mb); return NULL; @@ -142,9 +144,9 @@ struct mca_banks *mcabanks_alloc(void) void mcabanks_free(struct mca_banks *banks) { -if (banks == NULL) +if ( banks == NULL ) return; -if (banks->bank_map) +if ( banks->bank_map ) xfree(banks->bank_map); xfree(banks); } @@ -155,15 +157,16 @@ static void mcabank_clear(int banknum) status = mca_rdmsr(MSR_IA32_MCx_STATUS(banknum)); -if (status & MCi_STATUS_ADDRV) +if ( status & MCi_STATUS_ADDRV ) mca_wrmsr(MSR_IA32_MCx_ADDR(banknum), 0x0ULL); -if (status & MCi_STATUS_MISCV) +if ( status & MCi_STATUS_MISCV ) mca_wrmsr(MSR_IA32_MCx_MISC(banknum), 0x0ULL); mca_wrmsr(MSR_IA32_MCx_STATUS(banknum), 0x0ULL); } -/* Judging whether to Clear Machine Check error bank callback handler +/* + * Judging whether to Clear Machine Check error bank callback handler * According to Intel latest MCA OS Recovery Writer's Guide, * whether the error MCA bank needs to be cleared is decided by the mca_source * and MCi_status bit value. @@ -188,17 +191,15 @@ const struct mca_error_handler *__read_mostly mce_uhandlers; unsigned int __read_mostly mce_dhandler_num; unsigned int __read_mostly mce_uhandler_num; - -static void mca_init_bank(enum mca_source who, -struct mc_info *mi, int bank) +static void mca_init_bank(enum mca_source who, struct mc_info *mi, int bank) { struct mcinfo_bank *mib; -if (!mi) +if ( !mi ) return; mib = x86_mcinfo_reserve(mi, sizeof(*mib), MC_TYPE_BANK); -if (!mib) +if ( !mib ) { mi->flags |= MCINFO_FLAGS_UNCOMPLETE; return; @@ -209,26 +210,27 @@ static void mca_init_bank(enum mca_source who, mib->mc_bank = bank; mib->mc_domid = DOMID_INVALID; -if (mib->mc_status & MCi_STATUS_MISCV) +if ( mib->mc_status & MCi_STATUS_MISCV ) mib->mc_misc = mca_rdmsr(MSR_IA32_MCx_MISC(bank)); -if (mib->mc_status & MCi_STATUS_ADDRV) +if ( mib->mc_status & MCi_STATUS_ADDRV ) mib->mc_addr = mca_rdmsr(MSR_IA32_MCx_ADDR(bank)); -if ((mib->mc_status & MCi_STATUS_MISCV) && -(mib->mc_status & MCi_STATUS_ADDRV) && -(mc_check_addr(mib->mc_status, mib->mc_misc, MC_ADDR_PHYSICAL)) && -(who == MCA_POLLER || who == MCA_CMCI_HANDLER) && -(mfn_valid(_mfn(paddr_to_pfn(mib->mc_addr) +if ( (mib->mc_status & MCi_STATUS_MISCV) && + (mib->mc_status & MCi_STATUS_ADDRV) &&
[Xen-devel] [PATCH 3/6] x86/mce: adapt mcation.c to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- xen/arch/x86/cpu/mcheck/mcaction.c | 74 +- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c index f959bed2cb..e42267414e 100644 --- a/xen/arch/x86/cpu/mcheck/mcaction.c +++ b/xen/arch/x86/cpu/mcheck/mcaction.c @@ -6,15 +6,16 @@ static struct mcinfo_recovery * mci_action_add_pageoffline(int bank, struct mc_info *mi, - uint64_t mfn, uint32_t status) + uint64_t mfn, uint32_t status) { struct mcinfo_recovery *rec; -if (!mi) +if ( !mi ) return NULL; rec = x86_mcinfo_reserve(mi, sizeof(*rec), MC_TYPE_RECOVERY); -if (!rec) { +if ( !rec ) +{ mi->flags |= MCINFO_FLAGS_UNCOMPLETE; return NULL; } @@ -46,14 +47,15 @@ mc_memerr_dhandler(struct mca_binfo *binfo, int vmce_vcpuid; unsigned int mc_vcpuid; -if (!mc_check_addr(bank->mc_status, bank->mc_misc, MC_ADDR_PHYSICAL)) { +if ( !mc_check_addr(bank->mc_status, bank->mc_misc, MC_ADDR_PHYSICAL) ) +{ dprintk(XENLOG_WARNING, -"No physical address provided for memory error\n"); +"No physical address provided for memory error\n"); return; } mfn = bank->mc_addr >> PAGE_SHIFT; -if (offline_page(mfn, 1, )) +if ( offline_page(mfn, 1, ) ) { dprintk(XENLOG_WARNING, "Failed to offline page %lx for MCE error\n", mfn); @@ -63,21 +65,26 @@ mc_memerr_dhandler(struct mca_binfo *binfo, mci_action_add_pageoffline(binfo->bank, binfo->mi, mfn, status); /* This is free page */ -if (status & PG_OFFLINE_OFFLINED) +if ( status & PG_OFFLINE_OFFLINED ) *result = MCER_RECOVERED; -else if (status & PG_OFFLINE_AGAIN) +else if ( status & PG_OFFLINE_AGAIN ) *result = MCER_CONTINUE; -else if (status & PG_OFFLINE_PENDING) { +else if ( status & PG_OFFLINE_PENDING ) +{ /* This page has owner */ -if (status & PG_OFFLINE_OWNED) { +if ( status & PG_OFFLINE_OWNED ) +{ bank->mc_domid = status >> PG_OFFLINE_OWNER_SHIFT; mce_printk(MCE_QUIET, "MCE: This error page is ownded" - " by DOM %d\n", bank->mc_domid); -/* XXX: Cannot handle shared pages yet + " by DOM %d\n", bank->mc_domid); +/* + * XXX: Cannot handle shared pages yet * (this should identify all domains and gfn mapping to - * the mfn in question) */ + * the mfn in question) + */ BUG_ON( bank->mc_domid == DOMID_COW ); -if ( bank->mc_domid != DOMID_XEN ) { +if ( bank->mc_domid != DOMID_XEN ) +{ d = get_domain_by_id(bank->mc_domid); ASSERT(d); gfn = get_gpfn_from_mfn((bank->mc_addr) >> PAGE_SHIFT); @@ -85,45 +92,46 @@ mc_memerr_dhandler(struct mca_binfo *binfo, if ( unmmap_broken_page(d, _mfn(mfn), gfn) ) { printk("Unmap broken memory %lx for DOM%d failed\n", -mfn, d->domain_id); + mfn, d->domain_id); goto vmce_failed; } mc_vcpuid = global->mc_vcpuid; -if (mc_vcpuid == XEN_MC_VCPUID_INVALID || -/* - * Because MC# may happen asynchronously with the actual - * operation that triggers the error, the domain ID as - * well as the vCPU ID collected in 'global' at MC# are - * not always precise. In that case, fallback to broadcast. - */ -global->mc_domid != bank->mc_domid || -(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && - (!(global->mc_gstatus & MCG_STATUS_LMCE) || - !(d->vcpu[mc_vcpuid]->arch.vmce.mcg_ext_ctl & -MCG_EXT_CTL_LMCE_EN +if ( mc_vcpuid == XEN_MC_VCPUID_INVALID || + /* + * Because MC# may happen asynchronously with the actual + * operation that triggers the error, the domain ID as + * well as the vCPU ID collected in 'global' at MC# are + * not always precise. In that case, fallback to broadcast. + */ + global->mc_domid != bank->mc_domid || + (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &am
[Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers
The 64-bit DMAR fault address is composed of two 32 bits registers DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec: "Software is expected to access 32-bit registers as aligned doublewords", a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and DMAR_FEUADDR_REG separately in order to update a 64-bit fault address, rather than a 64-bit write to DMAR_FEADDR_REG. Though I haven't seen any errors caused by such one 64-bit write on real machines, it's still better to follow the specification. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- xen/drivers/passthrough/vtd/iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index daaed0abbd..067c092214 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1105,7 +1105,9 @@ static void dma_msi_set_affinity(struct irq_desc *desc, const cpumask_t *mask) spin_lock_irqsave(>register_lock, flags); dmar_writel(iommu->reg, DMAR_FEDATA_REG, msg.data); -dmar_writeq(iommu->reg, DMAR_FEADDR_REG, msg.address); +dmar_writel(iommu->reg, DMAR_FEADDR_REG, msg.address_lo); +if (x2apic_enabled) +dmar_writel(iommu->reg, DMAR_FEUADDR_REG, msg.address_hi); spin_unlock_irqrestore(>register_lock, flags); } -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
On 09/10/17 22:10 -0700, Dan Williams wrote: > On Sun, Sep 10, 2017 at 9:37 PM, Haozhong Zhang > <haozhong.zh...@intel.com> wrote: > > The kernel NVDIMM driver and the traditional NVDIMM management > > utilities in Dom0 does not work now. 'xen-ndctl' is added as an > > alternatively, which manages NVDIMM via Xen hypercalls. > > > > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> > > --- > > Cc: Ian Jackson <ian.jack...@eu.citrix.com> > > Cc: Wei Liu <wei.l...@citrix.com> > > --- > > .gitignore | 1 + > > tools/misc/Makefile| 4 ++ > > tools/misc/xen-ndctl.c | 172 > > + > > 3 files changed, 177 insertions(+) > > create mode 100644 tools/misc/xen-ndctl.c > > What about my offer to move this functionality into the upstream ndctl > utility [1]? I think it is thoroughly confusing that you are reusing > the name 'ndctl' and avoiding integration with the upstream ndctl > utility. > > [1]: https://patchwork.kernel.org/patch/9632865/ I'm not object to integrate it with ndctl. My only concern is that the integration will introduces two types of user interface. The upstream ndctl works with the kernel driver and provides easily used *names* (e.g., namespace0.0, region0, nmem0, etc.) for user input. However, this version patchset hides NFIT from Dom0 (to simplify the first implementation), so the kernel driver does not work in Dom0, neither does ndctl. Instead, xen-ndctl has to use *the physical address* for users to specify their interested NVDIMM region, which is different from upstream ndctl. Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC QEMU PATCH v3 07/10] nvdimm acpi: copy NFIT to Xen guest
Xen relies on QEMU to build the guest NFIT. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com> --- hw/acpi/nvdimm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c index 9121a766c6..d9cdc5a531 100644 --- a/hw/acpi/nvdimm.c +++ b/hw/acpi/nvdimm.c @@ -404,6 +404,12 @@ static void nvdimm_build_nfit(AcpiNVDIMMState *state, GArray *table_offsets, build_header(linker, table_data, (void *)(table_data->data + header), "NFIT", sizeof(NvdimmNfitHeader) + fit_buf->fit->len, 1, NULL, NULL); + +if (xen_enabled()) { +xen_acpi_copy_to_guest("NFIT", table_data->data + header, + sizeof(NvdimmNfitHeader) + fit_buf->fit->len, + XEN_DM_ACPI_BLOB_TYPE_TABLE); +} } #define NVDIMM_DSM_MEMORY_SIZE 4096 -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC QEMU PATCH v3 01/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero
The memory region of vNVDIMM on Xen is a RAM memory region, so memory_region_get_ram_ptr() cannot be used in nvdimm_realize() to get a pointer to the label data area in that region. To be worse, it may abort QEMU. As Xen currently does not support labels (i.e. label size is 0) and every access in QEMU to labels is led by a label size check, let's not intiailize nvdimm->label_data if the label size is 0. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> --- hw/mem/nvdimm.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 952fce5ec8..3e58538b99 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -87,7 +87,15 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp) align = memory_region_get_alignment(mr); pmem_size = size - nvdimm->label_size; -nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size; +/* + * The memory region of vNVDIMM on Xen is not a RAM memory region, + * so memory_region_get_ram_ptr() below will abort QEMU. In + * addition that Xen currently does not support vNVDIMM labels + * (i.e. label_size is zero here), let's not initialize of the + * pointer to label data if the label size is zero. + */ +if (nvdimm->label_size) +nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size; pmem_size = QEMU_ALIGN_DOWN(pmem_size, align); if (size <= nvdimm->label_size || !pmem_size) { -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC QEMU PATCH v3 10/10] hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled
If the machine option 'nvdimm' is enabled and QEMU is used as Xen device model, construct the guest NFIT and ACPI namespace devices of vNVDIMM and copy them into guest memory. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: Stefano Stabellini <sstabell...@kernel.org> Cc: Anthony Perard <anthony.per...@citrix.com> --- hw/acpi/aml-build.c | 10 +++--- hw/i386/pc.c | 16 ++-- hw/i386/xen/xen-hvm.c | 25 +++-- include/hw/xen/xen.h | 7 +++ stubs/xen-hvm.c | 4 5 files changed, 51 insertions(+), 11 deletions(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 36a6cc450e..5f57c1bef3 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -22,6 +22,7 @@ #include "qemu/osdep.h" #include #include "hw/acpi/aml-build.h" +#include "hw/xen/xen.h" #include "qemu/bswap.h" #include "qemu/bitops.h" #include "sysemu/numa.h" @@ -1531,9 +1532,12 @@ build_header(BIOSLinker *linker, GArray *table_data, h->oem_revision = cpu_to_le32(1); memcpy(h->asl_compiler_id, ACPI_BUILD_APPNAME4, 4); h->asl_compiler_revision = cpu_to_le32(1); -/* Checksum to be filled in by Guest linker */ -bios_linker_loader_add_checksum(linker, ACPI_BUILD_TABLE_FILE, -tbl_offset, len, checksum_offset); +/* No linker is used when QEMU is used as Xen device model. */ +if (!xen_enabled()) { +/* Checksum to be filled in by Guest linker */ +bios_linker_loader_add_checksum(linker, ACPI_BUILD_TABLE_FILE, +tbl_offset, len, checksum_offset); +} } void *acpi_data_push(GArray *table_data, unsigned size) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 5cbdce61a7..7101d380a0 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1252,12 +1252,16 @@ void pc_machine_done(Notifier *notifier, void *data) } } -acpi_setup(); -if (pcms->fw_cfg) { -pc_build_smbios(pcms); -pc_build_feature_control_file(pcms); -/* update FW_CFG_NB_CPUS to account for -device added CPUs */ -fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); +if (!xen_enabled()) { +acpi_setup(); +if (pcms->fw_cfg) { +pc_build_smbios(pcms); +pc_build_feature_control_file(pcms); +/* update FW_CFG_NB_CPUS to account for -device added CPUs */ +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); +} +} else { +xen_dm_acpi_setup(pcms); } if (pcms->apic_id_limit > 255) { diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c index b74c4ffb9c..d81cc7dbbc 100644 --- a/hw/i386/xen/xen-hvm.c +++ b/hw/i386/xen/xen-hvm.c @@ -265,7 +265,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr, /* RAM already populated in Xen */ fprintf(stderr, "%s: do not alloc "RAM_ADDR_FMT " bytes of ram at "RAM_ADDR_FMT" when runstate is INMIGRATE\n", -__func__, size, ram_addr); +__func__, size, ram_addr); return; } @@ -1251,7 +1251,7 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data) static int xen_dm_acpi_needed(PCMachineState *pcms) { -return 0; +return pcms->acpi_nvdimm_state.is_enabled; } static int dm_acpi_buf_init(XenIOState *state) @@ -1309,6 +1309,20 @@ static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state) return dm_acpi_buf_init(state); } +static void xen_dm_acpi_nvdimm_setup(PCMachineState *pcms) +{ +GArray *table_offsets = g_array_new(false, true /* clear */, +sizeof(uint32_t)); +GArray *table_data = g_array_new(false, true /* clear */, 1); + +nvdimm_build_acpi(table_offsets, table_data, + NULL, >acpi_nvdimm_state, + MACHINE(pcms)->ram_slots); + +g_array_free(table_offsets, true); +g_array_free(table_data, true); +} + static int xs_write_dm_acpi_blob_entry(const char *name, const char *entry, const char *value) { @@ -1408,6 +1422,13 @@ int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length, return 0; } +void xen_dm_acpi_setup(PCMachineState *pcms) +{ +if (pcms->acpi_nvdimm_state.is_enabled) { +xen_dm_acpi_nvdimm_setup(pcms); +} +} + void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory) { int i, rc; diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h index 38dcd1a7d4..8c48195e12
[Xen-devel] [RFC QEMU PATCH v3 08/10] nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
Xen relies on QEMU to build the ACPI namespace device of vNVDIMM for Xen guest. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com> --- hw/acpi/nvdimm.c | 55 ++- 1 file changed, 38 insertions(+), 17 deletions(-) diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c index d9cdc5a531..bf887512ad 100644 --- a/hw/acpi/nvdimm.c +++ b/hw/acpi/nvdimm.c @@ -1226,22 +1226,8 @@ static void nvdimm_build_nvdimm_devices(Aml *root_dev, uint32_t ram_slots) } } -static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data, - BIOSLinker *linker, GArray *dsm_dma_arrea, - uint32_t ram_slots) +static void nvdimm_build_ssdt_device(Aml *dev, uint32_t ram_slots) { -Aml *ssdt, *sb_scope, *dev; -int mem_addr_offset, nvdimm_ssdt; - -acpi_add_table(table_offsets, table_data); - -ssdt = init_aml_allocator(); -acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader)); - -sb_scope = aml_scope("\\_SB"); - -dev = aml_device("NVDR"); - /* * ACPI 6.0: 9.20 NVDIMM Devices: * @@ -1262,6 +1248,25 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data, nvdimm_build_fit(dev); nvdimm_build_nvdimm_devices(dev, ram_slots); +} + +static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data, + BIOSLinker *linker, GArray *dsm_dma_arrea, + uint32_t ram_slots) +{ +Aml *ssdt, *sb_scope, *dev; +int mem_addr_offset, nvdimm_ssdt; + +acpi_add_table(table_offsets, table_data); + +ssdt = init_aml_allocator(); +acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader)); + +sb_scope = aml_scope("\\_SB"); + +dev = aml_device("NVDR"); + +nvdimm_build_ssdt_device(dev, ram_slots); aml_append(sb_scope, dev); aml_append(ssdt, sb_scope); @@ -1285,6 +1290,18 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data, free_aml_allocator(); } +static void nvdimm_build_xen_ssdt(uint32_t ram_slots) +{ +Aml *dev = init_aml_allocator(); + +nvdimm_build_ssdt_device(dev, ram_slots); +build_append_named_dword(dev->buf, NVDIMM_ACPI_MEM_ADDR); +xen_acpi_copy_to_guest("NVDR", dev->buf->data, dev->buf->len, + XEN_DM_ACPI_BLOB_TYPE_NSDEV); + +free_aml_allocator(); +} + void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data, BIOSLinker *linker, AcpiNVDIMMState *state, uint32_t ram_slots) @@ -1296,8 +1313,12 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data, return; } -nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem, - ram_slots); +if (!xen_enabled()) { +nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem, + ram_slots); +} else { +nvdimm_build_xen_ssdt(ram_slots); +} device_list = nvdimm_get_device_list(); /* no NVDIMM device is plugged. */ -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest
This is the QEMU part patches that works with the associated Xen patches to enable vNVDIMM support for Xen HVM domains. Xen relies on QEMU to build guest NFIT and NVDIMM namespace devices, and allocate guest address space for vNVDIMM devices. All patches can be found at Xen: https://github.com/hzzhan9/xen.git nvdimm-rfc-v3 QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3 Patch 1 is to avoid dereferencing the NULL pointer to non-existing label data, as the Xen side support for labels is not implemented yet. Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug memory region for Xen guest, in order to make the existing nvdimm device plugging path work on Xen. Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is used as the Xen device model. Haozhong Zhang (10): nvdimm: do not intiailize nvdimm->label_data if label size is zero hw/xen-hvm: create the hotplug memory region on Xen hostmem-xen: add a host memory backend for Xen nvdimm acpi: do not use fw_cfg on Xen hw/xen-hvm: initialize DM ACPI hw/xen-hvm: add function to copy ACPI into guest memory nvdimm acpi: copy NFIT to Xen guest nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest nvdimm acpi: do not build _FIT method on Xen hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled backends/Makefile.objs | 1 + backends/hostmem-xen.c | 108 ++ backends/hostmem.c | 9 +++ hw/acpi/aml-build.c| 10 ++- hw/acpi/nvdimm.c | 79 ++- hw/i386/pc.c | 102 ++--- hw/i386/xen/xen-hvm.c | 204 - hw/mem/nvdimm.c| 10 ++- hw/mem/pc-dimm.c | 6 +- include/hw/i386/pc.h | 1 + include/hw/xen/xen.h | 25 ++ stubs/xen-hvm.c| 10 +++ 12 files changed, 495 insertions(+), 70 deletions(-) create mode 100644 backends/hostmem-xen.c -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC QEMU PATCH v3 02/10] hw/xen-hvm: create the hotplug memory region on Xen
The guest physical address of vNVDIMM is allocated from the hotplug memory region, which is not created when QEMU is used as Xen device model. In order to use vNVDIMM for Xen HVM domains, this commit reuses the code for pc machine type to create the hotplug memory region for Xen HVM domains. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> CC: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Stefano Stabellini <sstabell...@kernel.org> Cc: Anthony Perard <anthony.per...@citrix.com> --- hw/i386/pc.c | 86 --- hw/i386/xen/xen-hvm.c | 2 ++ include/hw/i386/pc.h | 1 + 3 files changed, 51 insertions(+), 38 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 21081041d5..5cbdce61a7 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1347,6 +1347,53 @@ void xen_load_linux(PCMachineState *pcms) pcms->fw_cfg = fw_cfg; } +void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory) +{ +MachineState *machine = MACHINE(pcms); +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); +ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size; + +if (!pcmc->has_reserved_memory || machine->ram_size >= machine->maxram_size) +return; + +if (memory_region_size(>hotplug_memory.mr)) { +error_report("hotplug memory region has been initialized"); +exit(EXIT_FAILURE); +} + +if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) { +error_report("unsupported amount of memory slots: %"PRIu64, + machine->ram_slots); +exit(EXIT_FAILURE); +} + +if (QEMU_ALIGN_UP(machine->maxram_size, + TARGET_PAGE_SIZE) != machine->maxram_size) { +error_report("maximum memory size must by aligned to multiple of " + "%d bytes", TARGET_PAGE_SIZE); +exit(EXIT_FAILURE); +} + +pcms->hotplug_memory.base = +ROUND_UP(0x1ULL + pcms->above_4g_mem_size, 1ULL << 30); + +if (pcmc->enforce_aligned_dimm) { +/* size hotplug region assuming 1G page max alignment per slot */ +hotplug_mem_size += (1ULL << 30) * machine->ram_slots; +} + +if ((pcms->hotplug_memory.base + hotplug_mem_size) < hotplug_mem_size) { +error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT, + machine->maxram_size); +exit(EXIT_FAILURE); +} + +memory_region_init(>hotplug_memory.mr, OBJECT(pcms), + "hotplug-memory", hotplug_mem_size); +memory_region_add_subregion(system_memory, pcms->hotplug_memory.base, +>hotplug_memory.mr); +} + void pc_memory_init(PCMachineState *pcms, MemoryRegion *system_memory, MemoryRegion *rom_memory, @@ -1398,44 +1445,7 @@ void pc_memory_init(PCMachineState *pcms, } /* initialize hotplug memory address space */ -if (pcmc->has_reserved_memory && -(machine->ram_size < machine->maxram_size)) { -ram_addr_t hotplug_mem_size = -machine->maxram_size - machine->ram_size; - -if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) { -error_report("unsupported amount of memory slots: %"PRIu64, - machine->ram_slots); -exit(EXIT_FAILURE); -} - -if (QEMU_ALIGN_UP(machine->maxram_size, - TARGET_PAGE_SIZE) != machine->maxram_size) { -error_report("maximum memory size must by aligned to multiple of " - "%d bytes", TARGET_PAGE_SIZE); -exit(EXIT_FAILURE); -} - -pcms->hotplug_memory.base = -ROUND_UP(0x1ULL + pcms->above_4g_mem_size, 1ULL << 30); - -if (pcmc->enforce_aligned_dimm) { -/* size hotplug region assuming 1G page max alignment per slot */ -hotplug_mem_size += (1ULL << 30) * machine->ram_slots; -} - -if ((pcms->hotplug_memory.base + hotplug_mem_size) < -hotplug_mem_size) { -error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT, - machine->maxram_size); -exit(EXIT_FAILURE); -} - -memory_region_init(>hotplug_memory.mr, OBJECT(pcms), - "hotplug-memory", hotplug_mem_size); -memory_region_add_subregion(system_memory, pcms->hotplug_memory.base, ->hotplug_memory.mr); -} +pc_mem
[Xen-devel] [RFC QEMU PATCH v3 05/10] hw/xen-hvm: initialize DM ACPI
Probe the base address and the length of guest ACPI buffer reserved for copying ACPI from QEMU. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Stefano Stabellini <sstabell...@kernel.org> cc: Anthony Perard <anthony.per...@citrix.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> --- hw/i386/xen/xen-hvm.c | 66 +++ 1 file changed, 66 insertions(+) diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c index 90163e1a1b..ae895aaf03 100644 --- a/hw/i386/xen/xen-hvm.c +++ b/hw/i386/xen/xen-hvm.c @@ -18,6 +18,7 @@ #include "hw/xen/xen_backend.h" #include "qmp-commands.h" +#include "qemu/cutils.h" #include "qemu/error-report.h" #include "qemu/range.h" #include "sysemu/xen-mapcache.h" @@ -86,6 +87,18 @@ typedef struct XenPhysmap { QLIST_ENTRY(XenPhysmap) list; } XenPhysmap; +#define HVM_XS_DM_ACPI_ROOT"/hvmloader/dm-acpi" +#define HVM_XS_DM_ACPI_ADDRESS HVM_XS_DM_ACPI_ROOT"/address" +#define HVM_XS_DM_ACPI_LENGTH HVM_XS_DM_ACPI_ROOT"/length" + +typedef struct XenAcpiBuf { +ram_addr_t base; +ram_addr_t length; +ram_addr_t used; +} XenAcpiBuf; + +static XenAcpiBuf *dm_acpi_buf; + typedef struct XenIOState { ioservid_t ioservid; shared_iopage_t *shared_page; @@ -110,6 +123,8 @@ typedef struct XenIOState { hwaddr free_phys_offset; const XenPhysmap *log_for_dirtybit; +XenAcpiBuf dm_acpi_buf; + Notifier exit; Notifier suspend; Notifier wakeup; @@ -1234,6 +1249,52 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data) xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0); } +static int xen_dm_acpi_needed(PCMachineState *pcms) +{ +return 0; +} + +static int dm_acpi_buf_init(XenIOState *state) +{ +char path[80], *value; +unsigned int len; + +dm_acpi_buf = >dm_acpi_buf; + +snprintf(path, sizeof(path), + "/local/domain/%d"HVM_XS_DM_ACPI_ADDRESS, xen_domid); +value = xs_read(state->xenstore, 0, path, ); +if (!value) { +return -EINVAL; +} +if (qemu_strtoul(value, NULL, 16, _acpi_buf->base)) { +return -EINVAL; +} + +snprintf(path, sizeof(path), + "/local/domain/%d"HVM_XS_DM_ACPI_LENGTH, xen_domid); +value = xs_read(state->xenstore, 0, path, ); +if (!value) { +return -EINVAL; +} +if (qemu_strtoul(value, NULL, 16, _acpi_buf->length)) { +return -EINVAL; +} + +dm_acpi_buf->used = 0; + +return 0; +} + +static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state) +{ +if (!xen_dm_acpi_needed(pcms)) { +return 0; +} + +return dm_acpi_buf_init(state); +} + void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory) { int i, rc; @@ -1385,6 +1446,11 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory) /* Disable ACPI build because Xen handles it */ pcms->acpi_build_enabled = false; +if (xen_dm_acpi_init(pcms, state)) { +error_report("failed to initialize xen ACPI"); +goto err; +} + return; err: -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC QEMU PATCH v3 03/10] hostmem-xen: add a host memory backend for Xen
vNVDIMM requires a host memory backend to allocate its backend resources to the guest. When QEMU is used as Xen device model, the backend resource allocation of vNVDIMM is managed out of QEMU. A new host memory backend 'memory-backend-xen' is introduced to represent the backend resource allocated by Xen. It simply creates a memory region of the specified size as a placeholder in the guest address space, which will be mapped by Xen to the actual backend resource. Following example QEMU options create a vNVDIMM device backed by a 4GB host PMEM region at host physical address 0x1: -object memory-backend-xen,id=mem1,host-addr=0x1,size=4G -device nvdimm,id=nvdimm1,memdev=mem1 Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> --- backends/Makefile.objs | 1 + backends/hostmem-xen.c | 108 + backends/hostmem.c | 9 + hw/mem/pc-dimm.c | 6 ++- 4 files changed, 123 insertions(+), 1 deletion(-) create mode 100644 backends/hostmem-xen.c diff --git a/backends/Makefile.objs b/backends/Makefile.objs index 0400799efd..3096fde21f 100644 --- a/backends/Makefile.objs +++ b/backends/Makefile.objs @@ -5,6 +5,7 @@ common-obj-$(CONFIG_TPM) += tpm.o common-obj-y += hostmem.o hostmem-ram.o common-obj-$(CONFIG_LINUX) += hostmem-file.o +common-obj-${CONFIG_XEN_BACKEND} += hostmem-xen.o common-obj-y += cryptodev.o common-obj-y += cryptodev-builtin.o diff --git a/backends/hostmem-xen.c b/backends/hostmem-xen.c new file mode 100644 index 00..99211efd81 --- /dev/null +++ b/backends/hostmem-xen.c @@ -0,0 +1,108 @@ +/* + * QEMU Host Memory Backend for Xen + * + * Copyright(C) 2017 Intel Corporation. + * + * Author: + * Haozhong Zhang <haozhong.zh...@intel.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see <http://www.gnu.org/licenses/> + */ + +#include "qemu/osdep.h" +#include "sysemu/hostmem.h" +#include "qapi/error.h" +#include "qom/object_interfaces.h" + +#define TYPE_MEMORY_BACKEND_XEN "memory-backend-xen" + +#define MEMORY_BACKEND_XEN(obj) \ +OBJECT_CHECK(HostMemoryBackendXen, (obj), TYPE_MEMORY_BACKEND_XEN) + +typedef struct HostMemoryBackendXen HostMemoryBackendXen; + +struct HostMemoryBackendXen { +HostMemoryBackend parent_obj; + +uint64_t host_addr; +}; + +static void xen_backend_get_host_addr(Object *obj, Visitor *v, const char *name, + void *opaque, Error **errp) +{ +HostMemoryBackendXen *backend = MEMORY_BACKEND_XEN(obj); +uint64_t value = backend->host_addr; + +visit_type_size(v, name, , errp); +} + +static void xen_backend_set_host_addr(Object *obj, Visitor *v, const char *name, + void *opaque, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(obj); +HostMemoryBackendXen *xb = MEMORY_BACKEND_XEN(obj); +Error *local_err = NULL; +uint64_t value; + +if (memory_region_size(>mr)) { +error_setg(_err, "cannot change property value"); +goto out; +} + +visit_type_size(v, name, , _err); +if (local_err) { +goto out; +} +xb->host_addr = value; + + out: +error_propagate(errp, local_err); +} + +static void xen_backend_alloc(HostMemoryBackend *backend, Error **errp) +{ +if (!backend->size) { +error_setg(errp, "can't create backend with size 0"); +return; +} +memory_region_init(>mr, OBJECT(backend), "hostmem-xen", + backend->size); +backend->mr.align = getpagesize(); +} + +static void xen_backend_class_init(ObjectClass *oc, void *data) +{ +HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc); + +bc->alloc = xen_backend_alloc; + +object_class_property_add(oc, "host-addr", "int", + xen_backend_get_host_addr, + xen_backend_set_host_addr, + NULL, NULL, _abort); +} + +static const TypeInfo xen_backend_info = { +.name = TYPE_MEMORY_BACKEND_XEN, +.parent = TYPE_MEMORY_BACKEND, +.class_init = xen_ba
[Xen-devel] [RFC QEMU PATCH v3 04/10] nvdimm acpi: do not use fw_cfg on Xen
Xen relies on QEMU to build guest ACPI for NVDIMM. However, no fw_cfg is created when QEMU is used as Xen device model, so QEMU should avoid using fw_cfg on Xen. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> --- hw/acpi/nvdimm.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c index 6ceea196e7..9121a766c6 100644 --- a/hw/acpi/nvdimm.c +++ b/hw/acpi/nvdimm.c @@ -32,6 +32,7 @@ #include "hw/acpi/bios-linker-loader.h" #include "hw/nvram/fw_cfg.h" #include "hw/mem/nvdimm.h" +#include "hw/xen/xen.h" static int nvdimm_device_list(Object *obj, void *opaque) { @@ -890,8 +891,12 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io, state->dsm_mem = g_array_new(false, true /* clear */, 1); acpi_data_push(state->dsm_mem, sizeof(NvdimmDsmIn)); -fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data, -state->dsm_mem->len); + +/* No fw_cfg is created when QEMU is used as Xen device model. */ +if (!xen_enabled()) { +fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data, +state->dsm_mem->len); +} nvdimm_init_fit_buffer(>fit_buf); } -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC QEMU PATCH v3 06/10] hw/xen-hvm: add function to copy ACPI into guest memory
Xen relies on QEMU to build guest NFIT and NVDIMM namespace devices, and implements an interface to allow QEMU to copy its ACPI into guest memory. This commit implements the QEMU side support. The location of guest memory that can receive QEMU ACPI can be found from XenStore entries /local/domain/$dom_id/hvmloader/dm-acpi/{address,length}, which have been handled by previous commit. QEMU ACPI copied to guest is organized in blobs. For each blob, QEMU creates following XenStore entries under /local/domain/$dom_id/hvmloader/dm-acpi/$name to indicate its type, location in above guest memory region and size. - type the type of the passed ACPI, which can be the following values. * XEN_DM_ACPI_BLOB_TYPE_TABLE (0) indicates it's a complete ACPI table, and its signature is indicated by $name in the XenStore path. * XEN_DM_ACPI_BLOB_TYPE_NSDEV (1) indicates it's the body of a namespace device, and its device name is indicated by $name in the XenStore path. - offset offset in byte from the beginning of above guest memory region - length size in byte of the copied ACPI Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Stefano Stabellini <sstabell...@kernel.org> Cc: Anthony Perard <anthony.per...@citrix.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> --- hw/i386/xen/xen-hvm.c | 113 ++ include/hw/xen/xen.h | 18 stubs/xen-hvm.c | 6 +++ 3 files changed, 137 insertions(+) diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c index ae895aaf03..b74c4ffb9c 100644 --- a/hw/i386/xen/xen-hvm.c +++ b/hw/i386/xen/xen-hvm.c @@ -1286,6 +1286,20 @@ static int dm_acpi_buf_init(XenIOState *state) return 0; } +static ram_addr_t dm_acpi_buf_alloc(size_t length) +{ +ram_addr_t addr; + +if (dm_acpi_buf->length - dm_acpi_buf->used < length) { +return 0; +} + +addr = dm_acpi_buf->base + dm_acpi_buf->used; +dm_acpi_buf->used += length; + +return addr; +} + static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state) { if (!xen_dm_acpi_needed(pcms)) { @@ -1295,6 +1309,105 @@ static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state) return dm_acpi_buf_init(state); } +static int xs_write_dm_acpi_blob_entry(const char *name, + const char *entry, const char *value) +{ +XenIOState *state = container_of(dm_acpi_buf, XenIOState, dm_acpi_buf); +char path[80]; + +snprintf(path, sizeof(path), + "/local/domain/%d"HVM_XS_DM_ACPI_ROOT"/%s/%s", + xen_domid, name, entry); +if (!xs_write(state->xenstore, 0, path, value, strlen(value))) { +return -EIO; +} + +return 0; +} + +static size_t xen_memcpy_to_guest(ram_addr_t gpa, + const void *buf, size_t length) +{ +size_t copied = 0, size; +ram_addr_t s, e, offset, cur = gpa; +xen_pfn_t cur_pfn; +void *page; + +if (!buf || !length) { +return 0; +} + +s = gpa & TARGET_PAGE_MASK; +e = gpa + length; +if (e < s) { +return 0; +} + +while (cur < e) { +cur_pfn = cur >> TARGET_PAGE_BITS; +offset = cur - (cur_pfn << TARGET_PAGE_BITS); +size = (length >= TARGET_PAGE_SIZE - offset) ? + TARGET_PAGE_SIZE - offset : length; + +page = xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ | PROT_WRITE, +1, _pfn, NULL); +if (!page) { +break; +} + +memcpy(page + offset, buf, size); +xenforeignmemory_unmap(xen_fmem, page, 1); + +copied += size; +buf += size; +cur += size; +length -= size; +} + +return copied; +} + +int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length, + int type) +{ +char value[21]; +ram_addr_t buf_addr; +int rc; + +if (type != XEN_DM_ACPI_BLOB_TYPE_TABLE && +type != XEN_DM_ACPI_BLOB_TYPE_NSDEV) { +return -EINVAL; +} + +buf_addr = dm_acpi_buf_alloc(length); +if (!buf_addr) { +return -ENOMEM; +} +if (xen_memcpy_to_guest(buf_addr, blob, length) != length) { +return -EIO; +} + +snprintf(value, sizeof(value), "%d", type); +rc = xs_write_dm_acpi_blob_entry(name, "type", value); +if (rc) { +return rc; +} + +snprintf(value, sizeof(value), "%"PRIu64, buf_addr - dm_acpi_buf->base); +rc = xs_write_dm_acpi_blob_entry(name, "offset", value); +if (rc) { +return rc; +} + +snprintf(value, sizeof(val
[Xen-devel] [RFC QEMU PATCH v3 09/10] nvdimm acpi: do not build _FIT method on Xen
Xen currently does not support vNVDIMM hotplug and always sets QEMU option "maxmem" to be just enough for RAM and vNVDIMM, so it's not necessary to build _FIT method when QEMU is used as Xen device model. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com> --- hw/acpi/nvdimm.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c index bf887512ad..61789c3966 100644 --- a/hw/acpi/nvdimm.c +++ b/hw/acpi/nvdimm.c @@ -1245,7 +1245,14 @@ static void nvdimm_build_ssdt_device(Aml *dev, uint32_t ram_slots) /* 0 is reserved for root device. */ nvdimm_build_device_dsm(dev, 0); -nvdimm_build_fit(dev); +/* + * Xen does not support vNVDIMM hotplug, and always sets the QEMU + * option "maxmem" to be just enough for RAM and static plugged + * vNVDIMM, so it's unnecessary to build _FIT method on Xen. + */ +if (!xen_enabled()) { +nvdimm_build_fit(dev); +} nvdimm_build_nvdimm_devices(dev, ram_slots); } -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 34/39] tools/libacpi: add DM ACPI blacklists
Some guest ACPI tables and namespace devices are constructed by Xen, and should not be loaded from device model. This commit adds their table signatures and device names into two blacklists, which will be used to check the collisions between guest ACPI constructed by Xen and guest ACPI passed from device model. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/libacpi/build.c | 93 + tools/libacpi/libacpi.h | 5 +++ 2 files changed, 98 insertions(+) diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c index f9881c9604..493ca48025 100644 --- a/tools/libacpi/build.c +++ b/tools/libacpi/build.c @@ -56,6 +56,76 @@ struct acpi_info { uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */ }; +/* ACPI tables of following signatures should not appear in DM ACPI */ +static uint64_t dm_acpi_signature_blacklist[64]; +/* ACPI namespace devices of following names should not appear in DM ACPI */ +static const char *dm_acpi_devname_blacklist[64]; + +static int dm_acpi_blacklist_signature(struct acpi_config *config, uint64_t sig) +{ +unsigned int i, nr = ARRAY_SIZE(dm_acpi_signature_blacklist); + +if ( !(config->table_flags & ACPI_HAS_DM) ) +return 0; + +for ( i = 0; i < nr; i++ ) +{ +uint64_t entry = dm_acpi_signature_blacklist[i]; + +if ( entry == sig ) +return 0; +else if ( entry == 0 ) +break; +} + +if ( i >= nr ) +{ +config->table_flags &= ~ACPI_HAS_DM; + +printf("ERROR: DM ACPI signature blacklist is full (size %u), " + "disable DM ACPI\n", nr); + +return -ENOSPC; +} + +dm_acpi_signature_blacklist[i] = sig; + +return 0; +} + +static int dm_acpi_blacklist_devname(struct acpi_config *config, + const char *devname) +{ +unsigned int i, nr = ARRAY_SIZE(dm_acpi_devname_blacklist); + +if ( !(config->table_flags & ACPI_HAS_DM) ) +return 0; + +for ( i = 0; i < nr; i++ ) +{ +const char *entry = dm_acpi_devname_blacklist[i]; + +if ( !entry ) +break; +if ( !strncmp(entry, devname, 4) ) +return 0; +} + +if ( i >= nr ) +{ +config->table_flags &= ~ACPI_HAS_DM; + +printf("ERROR: DM ACPI devname blacklist is full (size %u), " + "disable loading DM ACPI\n", nr); + +return -ENOSPC; +} + +dm_acpi_devname_blacklist[i] = devname; + +return 0; +} + static void set_checksum( void *table, uint32_t checksum_offset, uint32_t length) { @@ -360,6 +430,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt, madt = construct_madt(ctxt, config, info); if (!madt) return -1; table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, madt); +dm_acpi_blacklist_signature(config, madt->header.signature); } /* HPET. */ @@ -368,6 +439,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt, hpet = construct_hpet(ctxt, config); if (!hpet) return -1; table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, hpet); +dm_acpi_blacklist_signature(config, hpet->header.signature); } /* WAET. */ @@ -377,6 +449,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt, if ( !waet ) return -1; table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, waet); +dm_acpi_blacklist_signature(config, waet->header.signature); } if ( config->table_flags & ACPI_HAS_SSDT_PM ) @@ -385,6 +458,9 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt, if (!ssdt) return -1; memcpy(ssdt, ssdt_pm, sizeof(ssdt_pm)); table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, ssdt); +dm_acpi_blacklist_devname(config, "AC"); +dm_acpi_blacklist_devname(config, "BAT0"); +dm_acpi_blacklist_devname(config, "BAT1"); } if ( config->table_flags & ACPI_HAS_SSDT_S3 ) @@ -450,6 +526,8 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt, offsetof(struct acpi_header, checksum), tcpa->header.length); } +dm_acpi_blacklist_signature(config, tcpa->header.signature); +dm_acpi_blacklist_devname(config, "TPM"); } /* SRAT and SLIT */ @@ -459,11 +537,17 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt, struct acpi_20_slit *slit = construct_slit(ctxt, config); if ( srat ) +{ table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ct
[Xen-devel] [RFC XEN PATCH v3 37/39] tools/libxl: allow aborting domain creation on fatal QMP init errors
If some errors happening during QMP initialization can affect the proper work of a domain, it'd be better to treat them as fatal errors and abort the creation of that domain. The existing types of QMP initialization errors are not treated as fatal, and do not abort the domain creation as before. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/libxl/libxl_create.c | 4 +++- tools/libxl/libxl_qmp.c| 9 ++--- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 9123585b52..3e05ea09e9 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1507,7 +1507,9 @@ static void domcreate_devmodel_started(libxl__egc *egc, if (dcs->sdss.dm.guest_domid) { if (d_config->b_info.device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { -libxl__qmp_initializations(gc, domid, d_config); +ret = libxl__qmp_initializations(gc, domid, d_config); +if (ret == ERROR_BADFAIL) +goto error_out; } } diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index eab993aca9..e1eb47c1d2 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -1175,11 +1175,12 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid, { const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config); libxl__qmp_handler *qmp = NULL; -int ret = 0; +bool ignore_error = true; +int ret = -1; qmp = libxl__qmp_initialize(gc, domid); if (!qmp) -return -1; +goto out; ret = libxl__qmp_query_serial(qmp); if (!ret && vnc && vnc->passwd) { ret = qmp_change(gc, qmp, "vnc", "password", vnc->passwd); @@ -1189,7 +1190,9 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid, ret = qmp_query_vnc(qmp); } libxl__qmp_close(qmp); -return ret; + + out: +return ret ? (ignore_error ? ERROR_FAIL : ERROR_BADFAIL) : 0; } /* -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 36/39] tools/xl: add xl domain configuration for virtual NVDIMM devices
A new xl domain configuration vnvdimms = [ 'type=mfn, backend=START_PMEM_MFN, nr_pages=N', ... ] is added to specify the virtual NVDIMM devices backed by the specified host PMEM pages. As the kernel PMEM driver does not work in Dom0 now, we have to specify MFNs. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- docs/man/xl.cfg.pod.5.in| 33 + tools/libxl/Makefile| 2 +- tools/libxl/libxl.h | 5 ++ tools/libxl/libxl_types.idl | 15 ++ tools/libxl/libxl_vnvdimm.c | 49 tools/xl/xl_parse.c | 110 +++- tools/xl/xl_vmcontrol.c | 15 +- 7 files changed, 226 insertions(+), 3 deletions(-) create mode 100644 tools/libxl/libxl_vnvdimm.c diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in index 79cb2eaea7..092b051561 100644 --- a/docs/man/xl.cfg.pod.5.in +++ b/docs/man/xl.cfg.pod.5.in @@ -1116,6 +1116,39 @@ FIFO-based event channel ABI support up to 131,071 event channels. Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit x86). +=item B
[Xen-devel] [RFC XEN PATCH v3 39/39] tools/libxl: build qemu options from xl vNVDIMM configs
For xl configs vnvdimms = [ 'type=mfn,backend=$PMEM0_MFN,nr_pages=$N0', ... ] the following qemu options will be built -machine ,nvdimm -m ,slots=$NR_SLOTS,maxmem=$MEM_SIZE -object memory-backend-xen,id=mem1,host-addr=$PMEM0_ADDR,size=$PMEM0_SIZE -device nvdimm,id=xen_nvdimm1,memdev=mem1 ... in which, - NR_SLOTS is the number of entries in vnvdimms + 1, - MEM_SIZE is the total size of all RAM and NVDIMM devices, - PMEM0_ADDR = PMEM0_MFN * 4096, - PMEM0_SIZE = N0 * 4096, Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/libxl/libxl_dm.c | 81 -- 1 file changed, 79 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index e0e6a99e67..9bdb3cdb29 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -910,6 +910,58 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, const char *target_path, return drive; } +#if defined(__linux__) + +static uint64_t libxl__build_dm_vnvdimm_args( +libxl__gc *gc, flexarray_t *dm_args, +struct libxl_device_vnvdimm *dev, int dev_no) +{ +uint64_t addr = 0, size = 0; +char *arg; + +switch (dev->backend_type) +{ +case LIBXL_VNVDIMM_BACKEND_TYPE_MFN: +addr = dev->u.mfn << XC_PAGE_SHIFT; +size = dev->nr_pages << XC_PAGE_SHIFT; +break; +} + +if (!size) +return 0; + +flexarray_append(dm_args, "-object"); +arg = GCSPRINTF("memory-backend-xen,id=mem%d,host-addr=%"PRIu64",size=%"PRIu64, +dev_no + 1, addr, size); +flexarray_append(dm_args, arg); + +flexarray_append(dm_args, "-device"); +arg = GCSPRINTF("nvdimm,id=xen_nvdimm%d,memdev=mem%d", +dev_no + 1, dev_no + 1); +flexarray_append(dm_args, arg); + +return size; +} + +static uint64_t libxl__build_dm_vnvdimms_args( +libxl__gc *gc, flexarray_t *dm_args, +struct libxl_device_vnvdimm *vnvdimms, int num_vnvdimms) +{ +uint64_t total_size = 0, size; +unsigned int i; + +for (i = 0; i < num_vnvdimms; i++) { +size = libxl__build_dm_vnvdimm_args(gc, dm_args, [i], i); +if (!size) +break; +total_size += size; +} + +return total_size; +} + +#endif /* __linux__ */ + static int libxl__build_device_model_args_new(libxl__gc *gc, const char *dm, int guest_domid, const libxl_domain_config *guest_config, @@ -923,13 +975,18 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, const libxl_device_nic *nics = guest_config->nics; const int num_disks = guest_config->num_disks; const int num_nics = guest_config->num_nics; +#if defined(__linux__) +const int num_vnvdimms = guest_config->num_vnvdimms; +#else +const int num_vnvdimms = 0; +#endif const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config); const libxl_sdl_info *sdl = dm_sdl(guest_config); const char *keymap = dm_keymap(guest_config); char *machinearg; flexarray_t *dm_args, *dm_envs; int i, connection, devid, ret; -uint64_t ram_size; +uint64_t ram_size, ram_size_in_byte = 0, vnvdimms_size = 0; const char *path, *chardev; char *user = NULL; @@ -1451,6 +1508,9 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, } } +if (num_vnvdimms) +machinearg = libxl__sprintf(gc, "%s,nvdimm", machinearg); + flexarray_append(dm_args, machinearg); for (i = 0; b_info->extra_hvm && b_info->extra_hvm[i] != NULL; i++) flexarray_append(dm_args, b_info->extra_hvm[i]); @@ -1460,8 +1520,25 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, } ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb); +if (num_vnvdimms) { +ram_size_in_byte = ram_size << 20; +vnvdimms_size = libxl__build_dm_vnvdimms_args(gc, dm_args, + guest_config->vnvdimms, + num_vnvdimms); +if (ram_size_in_byte + vnvdimms_size < ram_size_in_byte) { +LOG(ERROR, +"total size of RAM (%"PRIu64") and NVDIMM (%"PRIu64") overflow", +ram_size_in_byte, vnvdimms_size); +return ERROR_INVAL; +} +} flexarray_append(dm_args, "-m"); -flexarray_append(dm_args, GCSPRINTF("%"PRId64, ram_size)); +flexarray_append(dm_args, + vnvdimms_size ? + GCSPRINTF("%"PRId64",slots=%d,maxmem=%"PRId64,
[Xen-devel] [RFC XEN PATCH v3 32/39] tools/libacpi: add callbacks to access XenStore
libacpi needs to access information placed in XenStore in order to load ACPI built by the device model. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/firmware/hvmloader/util.c | 52 +++ tools/firmware/hvmloader/util.h | 9 +++ tools/firmware/hvmloader/xenbus.c | 44 +++-- tools/libacpi/libacpi.h | 10 tools/libxl/libxl_x86_acpi.c | 24 ++ 5 files changed, 126 insertions(+), 13 deletions(-) diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index 2f8a4654b0..5b8a4ee9d0 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -893,6 +893,53 @@ static uint32_t acpi_lapic_id(unsigned cpu) return LAPIC_ID(cpu); } +static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path) +{ +return xenstore_read(path, NULL); +} + +static int acpi_xs_write(struct acpi_ctxt *ctxt, + const char *path, const char *value) +{ +return xenstore_write(path, value); +} + +static unsigned int count_strings(const char *strings, unsigned int len) +{ +const char *p; +unsigned int n; + +for ( p = strings, n = 0; p < strings + len; p++ ) +if ( *p == '\0' ) +n++; + +return n; +} + +static char **acpi_xs_directory(struct acpi_ctxt *ctxt, +const char *path, unsigned int *num) +{ +const char *strings; +char *s, *p, **ret; +unsigned int len, n; + +strings = xenstore_directory(path, , NULL); +if ( !strings ) +return NULL; + +n = count_strings(strings, len); +ret = ctxt->mem_ops.alloc(ctxt, n * sizeof(p) + len, 0); +if ( !ret ) +return NULL; +memcpy([n], strings, len); + +s = (char *)[n]; +for ( p = s, *num = 0; p < s + len; p += strlen(p) + 1 ) +ret[(*num)++] = p; + +return ret; +} + void hvmloader_acpi_build_tables(struct acpi_config *config, unsigned int physical) { @@ -998,6 +1045,11 @@ void hvmloader_acpi_build_tables(struct acpi_config *config, ctxt.min_alloc_byte_align = 16; +ctxt.xs_ops.read = acpi_xs_read; +ctxt.xs_ops.write = acpi_xs_write; +ctxt.xs_ops.directory = acpi_xs_directory; +ctxt.xs_opaque = NULL; + acpi_build_tables(, config); hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr); diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h index e9fe6c6e79..37e62d93c0 100644 --- a/tools/firmware/hvmloader/util.h +++ b/tools/firmware/hvmloader/util.h @@ -225,6 +225,15 @@ const char *xenstore_read(const char *path, const char *default_resp); */ int xenstore_write(const char *path, const char *value); +/* Read a xenstore directory. Return NULL, or a nul-terminated string + * which contains all names of directory entries. Names are separated + * by '\0'. The returned string is in a static buffer, so only valid + * until the next xenstore/xenbus operation. If @default_resp is + * specified, it is returned in preference to a NULL or empty string + * received from xenstore. + */ +const char *xenstore_directory(const char *path, uint32_t *len, + const char *default_resp); /* Get a HVM param. */ diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c index 2b89a56fce..387c0971e1 100644 --- a/tools/firmware/hvmloader/xenbus.c +++ b/tools/firmware/hvmloader/xenbus.c @@ -257,24 +257,16 @@ static int xenbus_recv(uint32_t *reply_len, const char **reply_data, return 0; } - -/* Read a xenstore key. Returns a nul-terminated string (even if the XS - * data wasn't nul-terminated) or NULL. The returned string is in a - * static buffer, so only valid until the next xenstore/xenbus operation. - * If @default_resp is specified, it is returned in preference to a NULL or - * empty string received from xenstore. - */ -const char *xenstore_read(const char *path, const char *default_resp) +static const char *xenstore_read_common(const char *path, uint32_t *len, +const char *default_resp, bool is_dir) { -uint32_t len = 0, type = 0; +uint32_t type = 0, expected_type = is_dir ? XS_DIRECTORY : XS_READ; const char *answer = NULL; -xenbus_send(XS_READ, -path, strlen(path), -"", 1, /* nul separator */ +xenbus_send(expected_type, path, strlen(path), "", 1, /* nul separator */ NULL, 0); -if ( xenbus_recv(, , ) || (type != XS_READ) ) +if ( xenbus_recv(len, , ) || type != expected_type ) answer = NULL; if ( (default_resp != NULL) &&
[Xen-devel] [RFC XEN PATCH v3 33/39] tools/libacpi: add a simple AML builder
It is used by libacpi to generate SSDTs from ACPI namespace devices built by the device model. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/firmware/hvmloader/Makefile | 3 +- tools/libacpi/aml_build.c | 326 ++ tools/libacpi/aml_build.h | 116 ++ tools/libxl/Makefile | 3 +- 4 files changed, 446 insertions(+), 2 deletions(-) create mode 100644 tools/libacpi/aml_build.c create mode 100644 tools/libacpi/aml_build.h diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile index 7c4c0ce535..3e917507c8 100644 --- a/tools/firmware/hvmloader/Makefile +++ b/tools/firmware/hvmloader/Makefile @@ -76,11 +76,12 @@ smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\"" ACPI_PATH = ../../libacpi DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c -ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o +ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o aml_build.o $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\" CFLAGS += -I$(ACPI_PATH) vpath build.c $(ACPI_PATH) vpath static_tables.c $(ACPI_PATH) +vpath aml_build.c $(ACPI_PATH) OBJS += $(ACPI_OBJS) hvmloader: $(OBJS) diff --git a/tools/libacpi/aml_build.c b/tools/libacpi/aml_build.c new file mode 100644 index 00..9b4e28ad95 --- /dev/null +++ b/tools/libacpi/aml_build.c @@ -0,0 +1,326 @@ +/* + * tools/libacpi/aml_build.c + * + * Copyright (C) 2017, Intel Corporation. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License, version 2.1, as published by the Free Software Foundation. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; If not, see <http://www.gnu.org/licenses/>. + */ + +#include LIBACPI_STDUTILS +#include "libacpi.h" +#include "aml_build.h" + +#define AML_OP_SCOPE 0x10 +#define AML_OP_EXT 0x5B +#define AML_OP_DEVICE0x82 + +#define ACPI_NAMESEG_LEN 4 + +struct aml_build_alloctor { +struct acpi_ctxt *ctxt; +uint8_t *buf; +uint32_t capacity; +uint32_t used; +}; +static struct aml_build_alloctor alloc; + +static uint8_t *aml_buf_alloc(uint32_t size) +{ +uint8_t *buf = NULL; +struct acpi_ctxt *ctxt = alloc.ctxt; +uint32_t alloc_size, alloc_align = ctxt->min_alloc_byte_align; +uint32_t length = alloc.used + size; + +/* Overflow ... */ +if ( length < alloc.used ) +return NULL; + +if ( length <= alloc.capacity ) +{ +buf = alloc.buf + alloc.used; +alloc.used += size; +} +else +{ +alloc_size = length - alloc.capacity; +alloc_size = (alloc_size + alloc_align) & ~(alloc_align - 1); +buf = ctxt->mem_ops.alloc(ctxt, alloc_size, alloc_align); + +if ( buf && + buf == alloc.buf + alloc.capacity /* cont to existing buf */ ) +{ +alloc.capacity += alloc_size; +buf = alloc.buf + alloc.used; +alloc.used += size; +} +else +buf = NULL; +} + +return buf; +} + +static uint32_t get_package_length(uint8_t *pkg) +{ +uint32_t len; + +len = pkg - alloc.buf; +len = alloc.used - len; + +return len; +} + +/* + * On success, an object in the following form is stored at @buf. + * @byte + * the original content in @buf + */ +static int build_prepend_byte(uint8_t *buf, uint8_t byte) +{ +uint32_t len; + +len = buf - alloc.buf; +len = alloc.used - len; + +if ( !aml_buf_alloc(sizeof(uint8_t)) ) +return -1; + +if ( len ) +memmove(buf + 1, buf, len); +buf[0] = byte; + +return 0; +} + +/* + * On success, an object in the following form is stored at @buf. + * AML encoding of four-character @name + * the original content in @buf + * + * Refer to ACPI spec 6.1, Sec 20.2.2 "Name Objects Encoding". + * + * XXX: names of multiple segments (e.g. X.Y.Z) are not supported + */ +static int build_prepend_name(uint8_t *buf, const char *name) +{ +uint8_t *p = buf; +const char *s = name; +uint32_t len, name_len; + +while ( *s == '\\' || *s == '^' ) +{ +if ( build_prepend_byte(p, (uint8_t) *s) ) +return -1; +++p; +++s; +} + +if ( !*s ) +return buil
[Xen-devel] [RFC XEN PATCH v3 20/39] tools/xen-ndctl: add option '--mgmt' to command 'list'
If the option '--mgmt' is present, the command 'list' will list all PMEM regions for management usage. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/misc/xen-ndctl.c | 39 +-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c index 1289a83dbe..058f8ccaf5 100644 --- a/tools/misc/xen-ndctl.c +++ b/tools/misc/xen-ndctl.c @@ -57,9 +57,10 @@ static const struct xen_ndctl_cmd { .name= "list", -.syntax = "[--all | --raw ]", +.syntax = "[--all | --raw | --mgmt]", .help= "--all: the default option, list all PMEM regions of following types.\n" - "--raw: list all PMEM regions detected by Xen hypervisor.\n", + "--raw: list all PMEM regions detected by Xen hypervisor.\n" + "--mgmt: list all PMEM regions for management usage.\n", .handler = handle_list, .need_xc = true, }, @@ -162,12 +163,46 @@ static int handle_list_raw(void) return rc; } +static int handle_list_mgmt(void) +{ +int rc; +unsigned int nr = 0, i; +xen_sysctl_nvdimm_pmem_mgmt_region_t *mgmt_list; + +rc = xc_nvdimm_pmem_get_regions_nr(xch, PMEM_REGION_TYPE_MGMT, ); +if ( rc ) +{ +fprintf(stderr, "Cannot get the number of PMEM regions: %s.\n", +strerror(-rc)); +return rc; +} + +mgmt_list = malloc(nr * sizeof(*mgmt_list)); +if ( !mgmt_list ) +return -ENOMEM; + +rc = xc_nvdimm_pmem_get_regions(xch, PMEM_REGION_TYPE_MGMT, mgmt_list, ); +if ( rc ) +goto out; + +printf("Management PMEM regions:\n"); +for ( i = 0; i < nr; i++ ) +printf(" %u: MFN 0x%lx - 0x%lx, used 0x%lx\n", + i, mgmt_list[i].smfn, mgmt_list[i].emfn, mgmt_list[i].used_mfns); + + out: +free(mgmt_list); + +return rc; +} + static const struct list_handlers { const char *option; int (*handler)(void); } list_hndrs[] = { { "--raw", handle_list_raw }, +{ "--mgmt", handle_list_mgmt }, }; static const unsigned int nr_list_hndrs = -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 24/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions
Allow XEN_SYSCTL_nvdimm_pmem_get_regions to return a list of data PMEM regions. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- tools/libxc/xc_misc.c | 8 xen/common/pmem.c | 46 + xen/include/public/sysctl.h | 12 3 files changed, 66 insertions(+) diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index db74df853a..93a1f8fdc5 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -944,6 +944,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type, size = sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t) * max; break; +case PMEM_REGION_TYPE_DATA: +size = sizeof(xen_sysctl_nvdimm_pmem_data_region_t) * max; +break; + default: return -EINVAL; } @@ -969,6 +973,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type, set_xen_guest_handle(regions->u_buffer.mgmt_regions, buffer); break; +case PMEM_REGION_TYPE_DATA: +set_xen_guest_handle(regions->u_buffer.data_regions, buffer); +break; + default: rc = -EINVAL; goto out; diff --git a/xen/common/pmem.c b/xen/common/pmem.c index cbe557c220..ed4a014c30 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -251,6 +251,48 @@ static int pmem_get_mgmt_regions( return rc; } +static int pmem_get_data_regions( +XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_data_region_t) regions, +unsigned int *num_regions) +{ +struct list_head *cur; +unsigned int nr = 0, max = *num_regions; +xen_sysctl_nvdimm_pmem_data_region_t region; +int rc = 0; + +if ( !guest_handle_okay(regions, max * sizeof(region)) ) +return -EINVAL; + +spin_lock(_data_lock); + +list_for_each(cur, _data_regions) +{ +struct pmem *pmem = list_entry(cur, struct pmem, link); + +if ( nr >= max ) +break; + +region.smfn = pmem->smfn; +region.emfn = pmem->emfn; +region.mgmt_smfn = pmem->u.data.mgmt_smfn; +region.mgmt_emfn = pmem->u.data.mgmt_emfn; + +if ( copy_to_guest_offset(regions, nr, , 1) ) +{ +rc = -EFAULT; +break; +} + +nr++; +} + +spin_unlock(_data_lock); + +*num_regions = nr; + +return rc; +} + static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions) { unsigned int type = regions->type, max = regions->num_regions; @@ -269,6 +311,10 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions) rc = pmem_get_mgmt_regions(regions->u_buffer.mgmt_regions, ); break; +case PMEM_REGION_TYPE_DATA: +rc = pmem_get_data_regions(regions->u_buffer.data_regions, ); +break; + default: rc = -EINVAL; } diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index d7c12f23fb..8595ea438a 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -1141,6 +1141,16 @@ struct xen_sysctl_nvdimm_pmem_mgmt_region { typedef struct xen_sysctl_nvdimm_pmem_mgmt_region xen_sysctl_nvdimm_pmem_mgmt_region_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_mgmt_region_t); +/* PMEM_REGION_TYPE_DATA */ +struct xen_sysctl_nvdimm_pmem_data_region { +uint64_t smfn; +uint64_t emfn; +uint64_t mgmt_smfn; +uint64_t mgmt_emfn; +}; +typedef struct xen_sysctl_nvdimm_pmem_data_region xen_sysctl_nvdimm_pmem_data_region_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_data_region_t); + /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */ struct xen_sysctl_nvdimm_pmem_regions_nr { uint8_t type; /* IN: one of PMEM_REGION_TYPE_* */ @@ -1161,6 +1171,8 @@ struct xen_sysctl_nvdimm_pmem_regions { XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions; /* if type == PMEM_REGION_TYPE_MGMT */ XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) mgmt_regions; +/* if type == PMEM_REGION_TYPE_DATA */ +XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_data_region_t) data_regions; } u_buffer; /* IN: the guest handler where the entries of PMEM regions of the type @type are returned */ }; -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 31/39] tools/libacpi: add callback to translate GPA to GVA
The location of ACPI blobs passed from device modeil is offered in guest physical address. libacpi needs to convert the guest physical address to guest virtual address before it can access those ACPI blobs. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/firmware/hvmloader/util.c | 6 ++ tools/firmware/hvmloader/util.h | 1 + tools/libacpi/libacpi.h | 1 + tools/libxl/libxl_x86_acpi.c| 10 ++ 4 files changed, 18 insertions(+) diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index c2218d9fcb..2f8a4654b0 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -871,6 +871,11 @@ static unsigned long acpi_v2p(struct acpi_ctxt *ctxt, void *v) return virt_to_phys(v); } +static void *acpi_p2v(struct acpi_ctxt *ctxt, unsigned long p) +{ +return phys_to_virt(p); +} + static void *acpi_mem_alloc(struct acpi_ctxt *ctxt, uint32_t size, uint32_t align) { @@ -989,6 +994,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config, ctxt.mem_ops.alloc = acpi_mem_alloc; ctxt.mem_ops.free = acpi_mem_free; ctxt.mem_ops.v2p = acpi_v2p; +ctxt.mem_ops.p2v = acpi_p2v; ctxt.min_alloc_byte_align = 16; diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h index 2ef854eb8f..e9fe6c6e79 100644 --- a/tools/firmware/hvmloader/util.h +++ b/tools/firmware/hvmloader/util.h @@ -200,6 +200,7 @@ xen_pfn_t mem_hole_alloc(uint32_t nr_mfns); /* Allocate memory in a reserved region below 4GB. */ void *mem_alloc(uint32_t size, uint32_t align); #define virt_to_phys(v) ((unsigned long)(v)) +#define phys_to_virt(v) ((void *)(p)) /* Allocate memory in a scratch region */ void *scratch_alloc(uint32_t size, uint32_t align); diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h index 157f63f7bc..f5a1c384bc 100644 --- a/tools/libacpi/libacpi.h +++ b/tools/libacpi/libacpi.h @@ -51,6 +51,7 @@ struct acpi_ctxt { void *(*alloc)(struct acpi_ctxt *ctxt, uint32_t size, uint32_t align); void (*free)(struct acpi_ctxt *ctxt, void *v, uint32_t size); unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v); +void *(*p2v)(struct acpi_ctxt *ctxt, unsigned long p); } mem_ops; uint32_t min_alloc_byte_align; /* minimum alignment used by mem_ops.alloc */ diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c index 3b79b2179b..b14136949c 100644 --- a/tools/libxl/libxl_x86_acpi.c +++ b/tools/libxl/libxl_x86_acpi.c @@ -52,6 +52,15 @@ static unsigned long virt_to_phys(struct acpi_ctxt *ctxt, void *v) libxl_ctxt->alloc_base_paddr); } +static void *phys_to_virt(struct acpi_ctxt *ctxt, unsigned long p) +{ +struct libxl_acpi_ctxt *libxl_ctxt = +CONTAINER_OF(ctxt, struct libxl_acpi_ctxt, c); + +return (void *)((p - libxl_ctxt->alloc_base_paddr) + +libxl_ctxt->alloc_base_vaddr); +} + static void *mem_alloc(struct acpi_ctxt *ctxt, uint32_t size, uint32_t align) { @@ -181,6 +190,7 @@ int libxl__dom_load_acpi(libxl__gc *gc, libxl_ctxt.c.mem_ops.alloc = mem_alloc; libxl_ctxt.c.mem_ops.v2p = virt_to_phys; +libxl_ctxt.c.mem_ops.p2v = phys_to_virt; libxl_ctxt.c.mem_ops.free = acpi_mem_free; libxl_ctxt.c.min_alloc_byte_align = 16; -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 35/39] tools/libacpi: load ACPI built by the device model
ACPI tables built by the device model, whose signatures do not conflict with tables built by Xen except SSDT, are loaded after ACPI tables built by Xen. ACPI namespace devices built by the device model, whose names do not conflict with devices built by Xen, are assembled and placed in SSDTs after ACPI tables built by Xen. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/firmware/hvmloader/util.c | 15 +++ tools/libacpi/acpi2_0.h | 2 + tools/libacpi/build.c | 237 tools/libacpi/libacpi.h | 5 + 4 files changed, 259 insertions(+) diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index 5b8a4ee9d0..0468fea490 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -1019,6 +1019,21 @@ void hvmloader_acpi_build_tables(struct acpi_config *config, if ( !strncmp(xenstore_read("platform/acpi_laptop_slate", "0"), "1", 1) ) config->table_flags |= ACPI_HAS_SSDT_LAPTOP_SLATE; +s = xenstore_read(HVM_XS_DM_ACPI_ADDRESS, NULL); +if ( s ) +{ +config->dm.addr = strtoll(s, NULL, 0); + +s = xenstore_read(HVM_XS_DM_ACPI_LENGTH, NULL); +if ( s ) +{ +config->dm.length = strtoll(s, NULL, 0); +config->table_flags |= ACPI_HAS_DM; +} +else +config->dm.addr = 0; +} + config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC | ACPI_HAS_WAET | ACPI_HAS_PMTIMER | ACPI_HAS_BUTTONS | ACPI_HAS_VGA | diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h index 2619ba32db..365825e6bc 100644 --- a/tools/libacpi/acpi2_0.h +++ b/tools/libacpi/acpi2_0.h @@ -435,6 +435,7 @@ struct acpi_20_slit { #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T') #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T') #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T') +#define ACPI_2_0_SSDT_SIGNATURE ASCII32('S','S','D','T') /* * Table revision numbers. @@ -449,6 +450,7 @@ struct acpi_20_slit { #define ACPI_1_0_FADT_REVISION 0x01 #define ACPI_2_0_SRAT_REVISION 0x01 #define ACPI_2_0_SLIT_REVISION 0x01 +#define ACPI_2_0_SSDT_REVISION 0x02 #pragma pack () diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c index 493ca48025..8ec1dfda5f 100644 --- a/tools/libacpi/build.c +++ b/tools/libacpi/build.c @@ -15,6 +15,7 @@ #include LIBACPI_STDUTILS #include "acpi2_0.h" +#include "aml_build.h" #include "libacpi.h" #include "ssdt_s3.h" #include "ssdt_s4.h" @@ -56,6 +57,9 @@ struct acpi_info { uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */ }; +#define DM_ACPI_BLOB_TYPE_TABLE 0 /* ACPI table */ +#define DM_ACPI_BLOB_TYPE_NSDEV 1 /* AML of an ACPI namespace device */ + /* ACPI tables of following signatures should not appear in DM ACPI */ static uint64_t dm_acpi_signature_blacklist[64]; /* ACPI namespace devices of following names should not appear in DM ACPI */ @@ -141,6 +145,233 @@ static void set_checksum( p[checksum_offset] = -sum; } +static bool has_dm_tables(struct acpi_ctxt *ctxt, + const struct acpi_config *config) +{ +char **dir; +unsigned int num; + +if ( !(config->table_flags & ACPI_HAS_DM) || !config->dm.addr ) +return false; + +dir = ctxt->xs_ops.directory(ctxt, HVM_XS_DM_ACPI_ROOT, ); +if ( !dir || !num ) +return false; + +return true; +} + +/* Return true if no collision is found. */ +static bool check_signature_collision(uint64_t sig) +{ +unsigned int i; +for ( i = 0; i < ARRAY_SIZE(dm_acpi_signature_blacklist); i++ ) +{ +if ( sig == dm_acpi_signature_blacklist[i] ) +return false; +} +return true; +} + +/* Return true if no collision is found. */ +static int check_devname_collision(const char *name) +{ +unsigned int i; +for ( i = 0; i < ARRAY_SIZE(dm_acpi_devname_blacklist); i++ ) +{ +if ( !strncmp(name, dm_acpi_devname_blacklist[i], 4) ) +return false; +} +return true; +} + +static const char *xs_read_dm_acpi_blob_key(struct acpi_ctxt *ctxt, +const char *name, const char *key) +{ +/* + * @name is supposed to be 4 characters at most, and the longest @key + * so far is 'address' (7), so 30 characters is enough to hold the + * longest path HVM_XS_DM_ACPI_ROOT/name/key. + */ +#define DM_ACPI_BLOB_PATH_MAX_LENGTH 30 +char path[DM_ACPI_BLOB_PATH_MAX_LENGTH]; +snprintf(path, DM_ACPI_BLOB_PATH_MAX_LENGTH, HVM_XS_DM_
[Xen-devel] [RFC XEN PATCH v3 18/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
Allow XEN_SYSCTL_nvdimm_pmem_get_regions_nr to return the number of management PMEM regions. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- tools/libxc/xc_misc.c | 4 +++- xen/common/pmem.c | 4 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index bebe6d04c8..4b5558aaa5 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -894,7 +894,9 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr) xen_sysctl_nvdimm_op_t *nvdimm = int rc; -if ( !nr || type != PMEM_REGION_TYPE_RAW ) +if ( !nr || + (type != PMEM_REGION_TYPE_RAW && + type != PMEM_REGION_TYPE_MGMT) ) return -EINVAL; sysctl.cmd = XEN_SYSCTL_nvdimm_op; diff --git a/xen/common/pmem.c b/xen/common/pmem.c index 7a081c2879..54b3e7119a 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -142,6 +142,10 @@ static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr) regions_nr->num_regions = nr_raw_regions; break; +case PMEM_REGION_TYPE_MGMT: +regions_nr->num_regions = nr_mgmt_regions; +break; + default: rc = -EINVAL; } -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 29/39] tools: reserve guest memory for ACPI from device model
Some virtual devices (e.g. NVDIMM) require complex ACPI tables and definition blocks (in AML), which a device model (e.g. QEMU) has already been able to construct. Instead of introducing the redundant implementation to Xen, we would like to reuse the device model to construct those ACPI stuffs. This commit allows Xen to reserve an area in the guest memory for the device model to pass its ACPI tables and definition blocks to guest, which will be loaded by hvmloader. The base guest physical address and the size of the reserved area are passed to the device model via XenStore keys hvmloader/dm-acpi/{address, length}. An xl config "dm_acpi_pages = N" is added to specify the number of reserved guest memory pages. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/libxc/include/xc_dom.h| 1 + tools/libxc/xc_dom_x86.c| 13 + tools/libxl/libxl_dom.c | 25 + tools/libxl/libxl_types.idl | 1 + tools/xl/xl_parse.c | 17 - xen/include/public/hvm/hvm_xs_strings.h | 8 6 files changed, 64 insertions(+), 1 deletion(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index ce47058c41..7c541576e7 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -93,6 +93,7 @@ struct xc_dom_image { struct xc_dom_seg pgtables_seg; struct xc_dom_seg devicetree_seg; struct xc_dom_seg start_info_seg; /* HVMlite only */ +struct xc_dom_seg dm_acpi_seg;/* reserved PFNs for DM ACPI */ xen_pfn_t start_info_pfn; xen_pfn_t console_pfn; xen_pfn_t xenstore_pfn; diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..8755350295 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -674,6 +674,19 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom) ioreq_server_pfn(0)); xc_hvm_param_set(xch, domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES, NR_IOREQ_SERVER_PAGES); + +if ( dom->dm_acpi_seg.pages ) +{ +size_t acpi_size = dom->dm_acpi_seg.pages * XC_DOM_PAGE_SIZE(dom); + +rc = xc_dom_alloc_segment(dom, >dm_acpi_seg, "DM ACPI", + 0, acpi_size); +if ( rc != 0 ) +{ +DOMPRINTF("Unable to reserve memory for DM ACPI"); +goto out; +} +} } rc = xc_dom_alloc_segment(dom, >start_info_seg, diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index f54fd49a73..bad1719892 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -897,6 +897,29 @@ static int hvm_build_set_xs_values(libxl__gc *gc, goto err; } +if (dom->dm_acpi_seg.pages) { +uint64_t guest_addr_out = dom->dm_acpi_seg.pfn * XC_DOM_PAGE_SIZE(dom); + +if (guest_addr_out >= 0x1ULL) { +LOG(ERROR, +"Guest address of DM ACPI is 0x%"PRIx64", but expected below 4G", +guest_addr_out); +goto err; +} + +path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_ADDRESS, domid); +ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64, guest_addr_out); +if (ret) +goto err; + +path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_LENGTH, domid); +ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64, + (uint64_t)(dom->dm_acpi_seg.pages * + XC_DOM_PAGE_SIZE(dom))); +if (ret) +goto err; +} + return 0; err: @@ -1184,6 +1207,8 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, dom->vnode_to_pnode[i] = info->vnuma_nodes[i].pnode; } +dom->dm_acpi_seg.pages = info->u.hvm.dm_acpi_pages; + rc = libxl__build_dom(gc, domid, info, state, dom); if (rc != 0) goto out; diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 173d70acec..4acc0457f4 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -565,6 +565,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("rdm", libxl_rdm_reserve), ("rdm_mem_boundary_memkb", MemKB), ("mca_caps", uint64), + ("dm_acpi_pages",integer), ])), ("pv", Struct(None, [("kernel", string),
[Xen-devel] [RFC XEN PATCH v3 38/39] tools/libxl: initiate PMEM mapping via QMP callback
The base guest physical address of each vNVDIMM device is decided by QEMU. Add a QMP callback to get the base address from QEMU and query Xen hypervisor to map host PMEM pages to that address. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/libxl/libxl_qmp.c | 130 tools/libxl/libxl_vnvdimm.c | 30 ++ tools/libxl/libxl_vnvdimm.h | 30 ++ 3 files changed, 190 insertions(+) create mode 100644 tools/libxl/libxl_vnvdimm.h diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index e1eb47c1d2..299f9c8260 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -26,6 +26,7 @@ #include "_libxl_list.h" #include "libxl_internal.h" +#include "libxl_vnvdimm.h" /* #define DEBUG_RECEIVED */ @@ -1170,6 +1171,127 @@ int libxl_qemu_monitor_command(libxl_ctx *ctx, uint32_t domid, return rc; } +#if defined(__linux__) + +static int qmp_register_vnvdimm_callback(libxl__qmp_handler *qmp, + const libxl__json_object *o, + void *arg) +{ +GC_INIT(qmp->ctx); +const libxl_domain_config *guest_config = arg; +const libxl_device_vnvdimm *vnvdimm; +const libxl__json_object *obj, *sub_map, *sub_obj; +const char *id, *expected_id; +unsigned int i, slot; +unsigned long gpa, size, mfn, gpfn, nr_pages; +int rc = 0; + +for (i = 0; (obj = libxl__json_array_get(o, i)); i++) { +if (!libxl__json_object_is_map(obj)) +continue; + +sub_map = libxl__json_map_get("data", obj, JSON_MAP); +if (!sub_map) +continue; + +sub_obj = libxl__json_map_get("slot", sub_map, JSON_INTEGER); +slot = libxl__json_object_get_integer(sub_obj); +if (slot > guest_config->num_vnvdimms) { +LOG(ERROR, +"Invalid QEMU memory device slot %u, expecting less than %u", +slot, guest_config->num_vnvdimms); +rc = -ERROR_INVAL; +goto out; +} +vnvdimm = _config->vnvdimms[slot]; + +/* + * Double check whether it's a NVDIMM memory device, through + * all memory devices in QEMU on Xen are for vNVDIMM. + */ +expected_id = libxl__sprintf(gc, "xen_nvdimm%u", slot + 1); +if (!expected_id) { +LOG(ERROR, "Cannot build device id"); +rc = -ERROR_FAIL; +goto out; +} +sub_obj = libxl__json_map_get("id", sub_map, JSON_STRING); +id = libxl__json_object_get_string(sub_obj); +if (!id || strncmp(id, expected_id, strlen(expected_id))) { +LOG(ERROR, +"Invalid QEMU memory device id %s, expecting %s", +id, expected_id); +rc = -ERROR_FAIL; +goto out; +} + +sub_obj = libxl__json_map_get("addr", sub_map, JSON_INTEGER); +gpa = libxl__json_object_get_integer(sub_obj); +sub_obj = libxl__json_map_get("size", sub_map, JSON_INTEGER); +size = libxl__json_object_get_integer(sub_obj); +if ((gpa | size) & ~XC_PAGE_MASK) { +LOG(ERROR, +"Invalid address 0x%lx or size 0x%lx of QEMU memory device %s, " +"not aligned to 0x%lx", +gpa, size, id, XC_PAGE_SIZE); +rc = -ERROR_INVAL; +goto out; +} +gpfn = gpa >> XC_PAGE_SHIFT; + +nr_pages = size >> XC_PAGE_SHIFT; +if (nr_pages > vnvdimm->nr_pages) { +LOG(ERROR, +"Invalid size 0x%lx of QEMU memory device %s, " +"expecting no larger than 0x%lx", +size, id, vnvdimm->nr_pages << XC_PAGE_SHIFT); +rc = -ERROR_INVAL; +goto out; +} + +switch (vnvdimm->backend_type) { +case LIBXL_VNVDIMM_BACKEND_TYPE_MFN: +mfn = vnvdimm->u.mfn; +break; + +default: +LOG(ERROR, "Invalid NVDIMM backend type %u", vnvdimm->backend_type); +rc = -ERROR_INVAL; +goto out; +} + +rc = libxl_vnvdimm_add_pages(gc, qmp->domid, mfn, gpfn, nr_pages); +if (rc) { +LOG(ERROR, +"Cannot map PMEM pages for QEMU memory device %s, " +"mfn 0x%lx, gpfn 0x%lx, nr 0x%lx, rc %d", +id, mfn, gpfn, nr_pages, rc); +rc = -ERROR_FAIL; +goto out; +} +} + + out: +GC_FREE; +return rc; +} + +static int libxl__qmp_query_vnvdimms(libxl_
[Xen-devel] [RFC XEN PATCH v3 27/39] xen/pmem: release PMEM pages on HVM domain destruction
A new step RELMEM_pmem is added and taken before RELMEM_xen to release all PMEM pages mapped to a HVM domain. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: George Dunlap <george.dun...@eu.citrix.com> --- xen/arch/x86/domain.c| 32 xen/arch/x86/mm.c| 9 +++-- xen/common/pmem.c| 10 ++ xen/include/asm-x86/domain.h | 1 + xen/include/xen/pmem.h | 6 ++ 5 files changed, 52 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index dbddc536d3..1c4e788780 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1755,11 +1755,15 @@ static int relinquish_memory( { struct page_info *page; unsigned long x, y; +bool is_pmem_list = (list == >pmem_page_list); int ret = 0; /* Use a recursive lock, as we may enter 'free_domheap_page'. */ spin_lock_recursive(>page_alloc_lock); +if ( is_pmem_list ) +spin_lock(>pmem_lock); + while ( (page = page_list_remove_head(list)) ) { /* Grab a reference to the page so it won't disappear from under us. */ @@ -1841,8 +1845,9 @@ static int relinquish_memory( } } -/* Put the page on the list and /then/ potentially free it. */ -page_list_add_tail(page, >arch.relmem_list); +if ( !is_pmem_list ) +/* Put the page on the list and /then/ potentially free it. */ +page_list_add_tail(page, >arch.relmem_list); put_page(page); if ( hypercall_preempt_check() ) @@ -1852,10 +1857,13 @@ static int relinquish_memory( } } -/* list is empty at this point. */ -page_list_move(list, >arch.relmem_list); +if ( !is_pmem_list ) +/* list is empty at this point. */ +page_list_move(list, >arch.relmem_list); out: +if ( is_pmem_list ) +spin_unlock(>pmem_lock); spin_unlock_recursive(>page_alloc_lock); return ret; } @@ -1922,13 +1930,29 @@ int domain_relinquish_resources(struct domain *d) return ret; } +#ifndef CONFIG_NVDIMM_PMEM d->arch.relmem = RELMEM_xen; +#else +d->arch.relmem = RELMEM_pmem; +#endif spin_lock(>page_alloc_lock); page_list_splice(>arch.relmem_list, >page_list); INIT_PAGE_LIST_HEAD(>arch.relmem_list); spin_unlock(>page_alloc_lock); +#ifdef CONFIG_NVDIMM_PMEM +/* Fallthrough. Relinquish every page of PMEM. */ +case RELMEM_pmem: +if ( is_hvm_domain(d) ) +{ +ret = relinquish_memory(d, >pmem_page_list, ~0UL); +if ( ret ) +return ret; +} +d->arch.relmem = RELMEM_xen; +#endif + /* Fallthrough. Relinquish every page of memory. */ case RELMEM_xen: ret = relinquish_memory(d, >xenpage_list, ~0UL); diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 93ccf198c9..26f9e5a13e 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -106,6 +106,7 @@ #include #include #include +#include #include #include #include @@ -2341,8 +2342,12 @@ void put_page(struct page_info *page) if ( unlikely((nx & PGC_count_mask) == 0) ) { -if ( !is_pmem_page(page) /* PMEM page is not allocated from Xen heap. */ - && cleanup_page_cacheattr(page) == 0 ) +#ifdef CONFIG_NVDIMM_PMEM +if ( is_pmem_page(page) ) +pmem_page_cleanup(page); +else +#endif +if ( cleanup_page_cacheattr(page) == 0 ) free_domheap_page(page); else gdprintk(XENLOG_WARNING, diff --git a/xen/common/pmem.c b/xen/common/pmem.c index 2f9ad64a26..8b9378dce6 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -741,6 +741,16 @@ int pmem_populate(struct xen_pmem_map_args *args) return rc; } +void pmem_page_cleanup(struct page_info *page) +{ +ASSERT(is_pmem_page(page)); +ASSERT((page->count_info & PGC_count_mask) == 0); + +page->count_info = PGC_pmem_page | PGC_state_free; +page_set_owner(page, NULL); +set_gpfn_from_mfn(page_to_mfn(page), INVALID_M2P_ENTRY); +} + int __init pmem_dom0_setup_permission(struct domain *d) { struct list_head *cur; diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index fb8bf17458..8322546b5d 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -303,6 +303,7 @@ struct arch_domain enum { RELMEM_not_started, RELMEM_shared, +RELMEM_pmem, RELMEM_xen, RELMEM_l4, RELMEM_l3, diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h index 2dab90530b..dfbc412065 100644 --- a/xen/include/xen/pmem.h +++ b/xen/in
[Xen-devel] [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()
Replace pdx_to_page(pfn_to_pdx(pfn)) by mfn_to_page(pfn), which is identical to the former. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/x86_64/mm.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c index 6c5221f90c..c93383d7d9 100644 --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -720,12 +720,11 @@ static void cleanup_frame_table(struct mem_hotadd_info *info) spfn = info->spfn; epfn = info->epfn; -sva = (unsigned long)pdx_to_page(pfn_to_pdx(spfn)); -eva = (unsigned long)pdx_to_page(pfn_to_pdx(epfn)); +sva = (unsigned long)mfn_to_page(spfn); +eva = (unsigned long)mfn_to_page(epfn); /* Intialize all page */ -memset(mfn_to_page(spfn), -1, - (unsigned long)mfn_to_page(epfn) - (unsigned long)mfn_to_page(spfn)); +memset((void *)sva, -1, eva - sva); while (sva < eva) { -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0
... to avoid the inference with the PMEM driver and management utilities in Dom0. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Gang Wei <gang@intel.com> Cc: Shane Wang <shane.w...@intel.com> --- xen/arch/x86/acpi/power.c | 7 +++ xen/arch/x86/dom0_build.c | 5 + xen/arch/x86/shutdown.c | 3 +++ xen/arch/x86/tboot.c | 4 xen/common/kexec.c| 3 +++ xen/common/pmem.c | 21 + xen/drivers/acpi/nfit.c | 21 + xen/include/xen/acpi.h| 2 ++ xen/include/xen/pmem.h| 13 + 9 files changed, 79 insertions(+) diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index 1e4e5680a7..d135715a49 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -178,6 +178,10 @@ static int enter_state(u32 state) freeze_domains(); +#ifdef CONFIG_NVDIMM_PMEM +acpi_nfit_reinstate(); +#endif + acpi_dmar_reinstate(); if ( (error = disable_nonboot_cpus()) ) @@ -260,6 +264,9 @@ static int enter_state(u32 state) mtrr_aps_sync_end(); adjust_vtd_irq_affinities(); acpi_dmar_zap(); +#ifdef CONFIG_NVDIMM_PMEM +acpi_nfit_zap(); +#endif thaw_domains(); system_state = SYS_STATE_active; spin_unlock(_lock); diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c index f616b99ddc..10741e865a 100644 --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -452,6 +453,10 @@ int __init dom0_setup_permissions(struct domain *d) rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); } +#ifdef CONFIG_NVDIMM_PMEM +rc |= pmem_dom0_setup_permission(d); +#endif + return rc; } diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c index a87aa60add..1902dfe73e 100644 --- a/xen/arch/x86/shutdown.c +++ b/xen/arch/x86/shutdown.c @@ -550,6 +550,9 @@ void machine_restart(unsigned int delay_millisecs) if ( tboot_in_measured_env() ) { +#ifdef CONFIG_NVDIMM_PMEM +acpi_nfit_reinstate(); +#endif acpi_dmar_reinstate(); tboot_shutdown(TB_SHUTDOWN_REBOOT); } diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c index 59d7c477f4..24e3b81ff1 100644 --- a/xen/arch/x86/tboot.c +++ b/xen/arch/x86/tboot.c @@ -488,6 +488,10 @@ int __init tboot_parse_dmar_table(acpi_table_handler dmar_handler) /* but dom0 will read real table, so must zap it there too */ acpi_dmar_zap(); +#ifdef CONFIG_NVDIMM_PMEM +acpi_nfit_zap(); +#endif + return rc; } diff --git a/xen/common/kexec.c b/xen/common/kexec.c index fcc68bd4d8..c8c6138e71 100644 --- a/xen/common/kexec.c +++ b/xen/common/kexec.c @@ -366,6 +366,9 @@ static int kexec_common_shutdown(void) watchdog_disable(); console_start_sync(); spin_debug_disable(); +#ifdef CONFIG_NVDIMM_PMEM +acpi_nfit_reinstate(); +#endif acpi_dmar_reinstate(); return 0; diff --git a/xen/common/pmem.c b/xen/common/pmem.c index 49648222a6..c9f5f6e904 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -18,6 +18,8 @@ #include #include +#include +#include #include /* @@ -128,3 +130,22 @@ int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm) return rc; } + +#ifdef CONFIG_X86 + +int __init pmem_dom0_setup_permission(struct domain *d) +{ +struct list_head *cur; +struct pmem *pmem; +int rc = 0; + +list_for_each(cur, _raw_regions) +{ +pmem = list_entry(cur, struct pmem, link); +rc |= iomem_deny_access(d, pmem->smfn, pmem->emfn - 1); +} + +return rc; +} + +#endif /* CONFIG_X86 */ diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c index 68750c2edc..5f34cf2464 100644 --- a/xen/drivers/acpi/nfit.c +++ b/xen/drivers/acpi/nfit.c @@ -179,6 +179,24 @@ static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc) } } +void acpi_nfit_zap(void) +{ +uint32_t sig = 0x4e494654; /* "TFIN" */ + +if ( nfit_desc.acpi_table ) +write_atomic((uint32_t *)_desc.acpi_table->header.signature[0], + sig); +} + +void acpi_nfit_reinstate(void) +{ +uint32_t sig = 0x5449464e; /* "NFIT" */ + +if ( nfit_desc.acpi_table ) +write_atomic((uint32_t *)_desc.acpi_table->header.signature[0], + sig); +} + void __init acpi_nfit_boot_init(void) { acpi_status status; @@ -193,6 +211,9 @@ void __init acpi_nfit_boot_init(void) map_pages_to_xen((unsigned long)nfit_desc.acpi_table, PFN_DOWN(nfit_addr), PFN_UP(nfit_addr + nfit_len) - PFN_DOWN(nfit_addr), PAGE_HYPERVISOR); + +/* Hide NFIT from Dom0. */ +acpi_nfit_zap(); } void __init acpi_nfit_init(void) diff
[Xen-devel] [RFC XEN PATCH v3 30/39] tools/libacpi: expose the minimum alignment used by mem_ops.alloc
The AML builder added later needs to allocate contiguous memory across multiple calls to mem_ops.alloc(). Therefore, it needs to know the minimal alignment used by mem_ops.alloc(). Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/firmware/hvmloader/util.c | 2 ++ tools/libacpi/libacpi.h | 2 ++ tools/libxl/libxl_x86_acpi.c| 2 ++ 3 files changed, 6 insertions(+) diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index 0c3f2d24cd..c2218d9fcb 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -990,6 +990,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config, ctxt.mem_ops.free = acpi_mem_free; ctxt.mem_ops.v2p = acpi_v2p; +ctxt.min_alloc_byte_align = 16; + acpi_build_tables(, config); hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr); diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h index a2efd23b0b..157f63f7bc 100644 --- a/tools/libacpi/libacpi.h +++ b/tools/libacpi/libacpi.h @@ -52,6 +52,8 @@ struct acpi_ctxt { void (*free)(struct acpi_ctxt *ctxt, void *v, uint32_t size); unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v); } mem_ops; + +uint32_t min_alloc_byte_align; /* minimum alignment used by mem_ops.alloc */ }; struct acpi_config { diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c index 176175676f..3b79b2179b 100644 --- a/tools/libxl/libxl_x86_acpi.c +++ b/tools/libxl/libxl_x86_acpi.c @@ -183,6 +183,8 @@ int libxl__dom_load_acpi(libxl__gc *gc, libxl_ctxt.c.mem_ops.v2p = virt_to_phys; libxl_ctxt.c.mem_ops.free = acpi_mem_free; +libxl_ctxt.c.min_alloc_byte_align = 16; + rc = init_acpi_config(gc, dom, b_info, ); if (rc) { LOG(ERROR, "init_acpi_config failed (rc=%d)", rc); -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 15/39] x86_64/mm: allow customized location of extended frametable and M2P table
As the existing data in PMEM region is persistent, Xen hypervisor has no knowledge of which part is free to be used for the frame table and M2P table of that PMEM region. Instead, we will allow users or system admins to specify the location of those frame table and M2P table. The location is not necessarily at the beginning of the PMEM region, which is different from the case of hotplugged RAM. This commit adds the support for a customized page allocation function, which is used to allocate the memory for the frame table and M2P table. No page free function is added, and we require that all allocated pages can be reclaimed or has no effect out of memory_add_common(), if memory_add_common() fails. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/x86_64/mm.c | 83 1 file changed, 69 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c index c8ffafe8a8..d92307ca0b 100644 --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -106,13 +106,44 @@ struct mem_hotadd_info unsigned long cur; }; +struct mem_hotadd_alloc +{ +/* + * Allocate 2^PAGETABLE_ORDER pages. + * + * No free function is added right now, so we require that all + * allocated pages can be reclaimed easily or has no effect out of + * memory_add_common(), if memory_add_common() fails. + * + * For example, alloc_hotadd_mfn(), which is used in RAM hotplug, + * allocates pages from the hotplugged RAM. If memory_add_common() + * fails, the hotplugged RAM will not be available to Xen, so + * pages allocated by alloc_hotadd_mfns() will never be used and + * have no effect. + * + * Parameters: + * opaque: arguments of the allocator (depending on the implementation) + * + * Return: + * On success, return MFN of the first page. + * Otherwise, return mfn_x(INVALID_MFN). + */ +unsigned long (*alloc_mfns)(void *opaque); + +/* + * Additional arguments passed to @alloc_mfns(). + */ +void *opaque; +}; + static int hotadd_mem_valid(unsigned long pfn, struct mem_hotadd_info *info) { return (pfn < info->epfn && pfn >= info->spfn); } -static unsigned long alloc_hotadd_mfn(struct mem_hotadd_info *info) +static unsigned long alloc_hotadd_mfn(void *opaque) { +struct mem_hotadd_info *info = opaque; unsigned mfn; ASSERT((info->cur + ( 1UL << PAGETABLE_ORDER) < info->epfn) && @@ -315,7 +346,8 @@ static void destroy_m2p_mapping(struct mem_hotadd_info *info) * spfn/epfn: the pfn ranges to be setup * free_s/free_e: the pfn ranges that is free still */ -static int setup_compat_m2p_table(struct mem_hotadd_info *info) +static int setup_compat_m2p_table(struct mem_hotadd_info *info, + struct mem_hotadd_alloc *alloc) { unsigned long i, va, smap, emap, rwva, epfn = info->epfn, mfn; unsigned int n; @@ -369,7 +401,13 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info) if ( n == CNT ) continue; -mfn = alloc_hotadd_mfn(info); +mfn = alloc->alloc_mfns(alloc->opaque); +if ( mfn == mfn_x(INVALID_MFN) ) +{ +err = -ENOMEM; +break; +} + err = map_pages_to_xen(rwva, mfn, 1UL << PAGETABLE_ORDER, PAGE_HYPERVISOR); if ( err ) @@ -389,7 +427,8 @@ static int setup_compat_m2p_table(struct mem_hotadd_info *info) * Allocate and map the machine-to-phys table. * The L3 for RO/RWRW MPT and the L2 for compatible MPT should be setup already */ -static int setup_m2p_table(struct mem_hotadd_info *info) +static int setup_m2p_table(struct mem_hotadd_info *info, + struct mem_hotadd_alloc *alloc) { unsigned long i, va, smap, emap; unsigned int n; @@ -438,7 +477,13 @@ static int setup_m2p_table(struct mem_hotadd_info *info) break; if ( n < CNT ) { -unsigned long mfn = alloc_hotadd_mfn(info); +unsigned long mfn = alloc->alloc_mfns(alloc->opaque); + +if ( mfn == mfn_x(INVALID_MFN) ) +{ +ret = -ENOMEM; +goto error; +} ret = map_pages_to_xen( RDWR_MPT_VIRT_START + i * sizeof(unsigned long), @@ -483,7 +528,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info) #undef CNT #undef MFN -ret = setup_compat_m2p_table(info); +ret = setup_compat_m2p_table(info, alloc); error: return ret; } @@ -762,7 +807,7 @@ static void cleanup_frame_table(unsigned long spfn, unsigned long epfn) } static int setup_frametable_chunk(vo
[Xen-devel] [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable
No specification defines that PMEM regions cannot appear in margins between RAM regions. If that does happen, init_frametable() will need to allocate RAM for the part of frametable of PMEM regions. However, PMEM regions can be very large (several terabytes or more), so init_frametable() may fail. Because Xen does not use PMEM at the boot time, we can defer the actual resource allocation of frametable of PMEM regions. At the boot time, all pages of frametable of PMEM regions appearing between RAM regions are mapped one RAM page filled with 0xff. Any attempt, whichs write to those frametable pages before the their actual resource is allocated, implies bugs in Xen. Therefore, the read-only mapping is used here to make those bugs explicit. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: George Dunlap <george.dun...@eu.citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- xen/arch/x86/mm.c | 117 +- xen/arch/x86/setup.c | 4 ++ xen/drivers/acpi/Makefile | 2 + xen/drivers/acpi/nfit.c | 116 + xen/include/acpi/actbl1.h | 43 + xen/include/xen/acpi.h| 7 +++ 6 files changed, 278 insertions(+), 11 deletions(-) create mode 100644 xen/drivers/acpi/nfit.c diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index e5a029c9be..2fdf609805 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -83,6 +83,9 @@ * an application-supplied buffer). */ +#ifdef CONFIG_NVDIMM_PMEM +#include +#endif #include #include #include @@ -196,31 +199,123 @@ static int __init parse_mmio_relax(const char *s) } custom_param("mmio-relax", parse_mmio_relax); -static void __init init_frametable_chunk(void *start, void *end) +static void __init init_frametable_ram_chunk(unsigned long s, unsigned long e) { -unsigned long s = (unsigned long)start; -unsigned long e = (unsigned long)end; -unsigned long step, mfn; +unsigned long cur, step, mfn; -ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1))); -for ( ; s < e; s += step << PAGE_SHIFT ) +for ( cur = s; cur < e; cur += step << PAGE_SHIFT ) { step = 1UL << (cpu_has_page1gb && - !(s & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ? + !(cur & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ? L3_PAGETABLE_SHIFT - PAGE_SHIFT : L2_PAGETABLE_SHIFT - PAGE_SHIFT); /* * The hardcoded 4 below is arbitrary - just pick whatever you think * is reasonable to waste as a trade-off for using a large page. */ -while ( step && s + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) ) +while ( step && cur + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) ) step >>= PAGETABLE_ORDER; mfn = alloc_boot_pages(step, step); -map_pages_to_xen(s, mfn, step, PAGE_HYPERVISOR); +map_pages_to_xen(cur, mfn, step, PAGE_HYPERVISOR); } -memset(start, 0, end - start); -memset(end, -1, s - e); +memset((void *)s, 0, e - s); +memset((void *)e, -1, cur - e); +} + +#ifdef CONFIG_NVDIMM_PMEM +static void __init init_frametable_pmem_chunk(unsigned long s, unsigned long e) +{ +static unsigned long pmem_init_frametable_mfn; + +ASSERT(!((s | e) & (PAGE_SIZE - 1))); + +if ( !pmem_init_frametable_mfn ) +{ +pmem_init_frametable_mfn = alloc_boot_pages(1, 1); +if ( !pmem_init_frametable_mfn ) +panic("Not enough memory for pmem initial frame table page"); +memset(mfn_to_virt(pmem_init_frametable_mfn), -1, PAGE_SIZE); +} + +while ( s < e ) +{ +/* + * The real frame table entries of a pmem region will be + * created when the pmem region is registered to hypervisor. + * Any write attempt to the initial entries of that pmem + * region implies potential hypervisor bugs. In order to make + * those bugs explicit, map those initial entries as read-only. + */ +map_pages_to_xen(s, pmem_init_frametable_mfn, 1, PAGE_HYPERVISOR_RO); +s += PAGE_SIZE; +} +} +#endif /* CONFIG_NVDIMM_PMEM */ + +static void __init init_frametable_chunk(void *start, void *end) +{ +unsigned long s = (unsigned long)start; +unsigned long e = (unsigned long)end; +#ifdef CONFIG_NVDIMM_PMEM +unsigned long pmem_smfn, pmem_emfn; +unsigned long pmem_spage = s, pmem_epage = s; +unsigned long pmem_page_aligned; +bool found = false; +#endif /* CONFIG_NVDIMM_PMEM */ + +ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1))); + +#ifndef CONFIG_NVDIMM_PMEM +init_frametable_ram_chunk(s, e); +#else +while ( s < e ) +{ +/* No p
[Xen-devel] [RFC XEN PATCH v3 28/39] xen: add hypercall XENMEM_populate_pmem_map
This hypercall will be used by device models to map host PMEM pages to guest. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Daniel De Graaf <dgde...@tycho.nsa.gov> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> CC: Jan Beulich <jbeul...@suse.com> --- tools/flask/policy/modules/xen.if | 2 +- tools/libxc/include/xenctrl.h | 17 ++ tools/libxc/xc_domain.c | 15 + xen/common/compat/memory.c | 1 + xen/common/memory.c | 44 + xen/include/public/memory.h | 14 +++- xen/include/xsm/dummy.h | 11 ++ xen/include/xsm/xsm.h | 12 ++ xen/xsm/dummy.c | 4 xen/xsm/flask/hooks.c | 13 +++ xen/xsm/flask/policy/access_vectors | 2 ++ 11 files changed, 133 insertions(+), 2 deletions(-) diff --git a/tools/flask/policy/modules/xen.if b/tools/flask/policy/modules/xen.if index 912640002e..9634dee25f 100644 --- a/tools/flask/policy/modules/xen.if +++ b/tools/flask/policy/modules/xen.if @@ -55,7 +55,7 @@ define(`create_domain_common', ` psr_cmt_op psr_cat_op soft_reset }; allow $1 $2:security check_context; allow $1 $2:shadow enable; - allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp }; + allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp populate_pmem_map }; allow $1 $2:grant setup; allow $1 $2:hvm { cacheattr getparam hvmctl sethvmc setparam nested altp2mhvm altp2mhvm_op dm }; diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 41e5e3408c..a81dcdbe58 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2643,6 +2643,23 @@ int xc_nvdimm_pmem_setup_data(xc_interface *xch, unsigned long smfn, unsigned long emfn, unsigned long mgmt_smfn, unsigned long mgmt_emfn); +/* + * Map specified host PMEM pages to the specified guest address. + * + * Parameters: + * xch: xc interface handle + * domid: the target domain id + * mfn: the start MFN of the PMEM pages + * gfn: the start GFN of the target guest physical pages + * nr_mfns: the number of PMEM pages to be mapped + * + * Return: + * On success, return 0. Otherwise, return a non-zero error code. + */ +int xc_domain_populate_pmem_map(xc_interface *xch, uint32_t domid, +unsigned long mfn, unsigned long gfn, +unsigned long nr_mfns); + /* Compat shims */ #include "xenctrl_compat.h" diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 3bab4e8bab..b548da750a 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -2397,6 +2397,21 @@ int xc_domain_soft_reset(xc_interface *xch, domctl.domain = (domid_t)domid; return do_domctl(xch, ); } + +int xc_domain_populate_pmem_map(xc_interface *xch, uint32_t domid, +unsigned long mfn, unsigned long gfn, +unsigned long nr_mfns) +{ +struct xen_pmem_map args = { +.domid = domid, +.mfn = mfn, +.gfn = gfn, +.nr_mfns = nr_mfns, +}; + +return do_memory_op(xch, XENMEM_populate_pmem_map, , sizeof(args)); +} + /* * Local variables: * mode: C diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c index 35bb259808..51bec835b9 100644 --- a/xen/common/compat/memory.c +++ b/xen/common/compat/memory.c @@ -525,6 +525,7 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat) case XENMEM_add_to_physmap: case XENMEM_remove_from_physmap: case XENMEM_access_op: +case XENMEM_populate_pmem_map: break; case XENMEM_get_vnumainfo: diff --git a/xen/common/memory.c b/xen/common/memory.c index 26da6050f6..31ef480562 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -1379,6 +1380,49 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) } #endif +#ifdef CONFIG_NVDIMM_PMEM +case XENMEM_populate_pmem_map: +{ +struct xen_pmem_map map; +struct xen_pmem_map_args args; + +if ( copy_from_guest(, arg, 1) ) +return -EFAULT; + +if ( map.domid == DOMID_SELF ) +return -EINVAL; + +d = rcu_lock_domain_by_any_id(map.domid); +if ( !d ) +return -EINVAL; + +rc = xsm_populate_pmem_map(XSM_TARGET, curr_d, d); +if ( rc ) +{ +rcu_unlock_domain(d); +re
[Xen-devel] [RFC XEN PATCH v3 16/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region
Add a command XEN_SYSCTL_nvdimm_pmem_setup to hypercall XEN_SYSCTL_nvdimm_op to setup the frame table and M2P table of a PMEM region. This command is currently used to setup the management PMEM region which is used to store the frame table and M2P table of other PMEM regions and itself. The management PMEM region should not be mapped to guest. PMEM pages are not added in any Xen or domain heaps. A new flag PGC_pmem_page is used to indicate whether a page is from PMEM and avoid returning PMEM pages to heaps. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: George Dunlap <george.dun...@eu.citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- tools/libxc/include/xenctrl.h | 16 + tools/libxc/xc_misc.c | 34 ++ xen/arch/x86/mm.c | 3 +- xen/arch/x86/x86_64/mm.c | 72 + xen/common/pmem.c | 142 ++ xen/include/asm-x86/mm.h | 10 ++- xen/include/public/sysctl.h | 18 ++ xen/include/xen/pmem.h| 8 +++ 8 files changed, 301 insertions(+), 2 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index d750e67460..7c5707fe11 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2605,6 +2605,22 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type, void *buffer, uint32_t *nr); +/* + * Setup the specified PMEM pages for management usage. If success, + * these PMEM pages can be used to store the frametable and M2P table + * of itself and other PMEM pages. These management PMEM pages will + * never be mapped to guest. + * + * Parameters: + * xch:xc interface handle + * smfn, emfn: the start and end MFN of the PMEM region + * + * Return: + * On success, return 0. Otherwise, return a non-zero error code. + */ +int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch, + unsigned long smfn, unsigned long emfn); + /* Compat shims */ #include "xenctrl_compat.h" diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index f9ce802eda..bebe6d04c8 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -975,6 +975,40 @@ out: return rc; } +static void xc_nvdimm_pmem_setup_common(struct xen_sysctl *sysctl, +unsigned long smfn, unsigned long emfn, +unsigned long mgmt_smfn, +unsigned long mgmt_emfn) +{ +xen_sysctl_nvdimm_op_t *nvdimm = >u.nvdimm; +xen_sysctl_nvdimm_pmem_setup_t *setup = >u.pmem_setup; + +sysctl->cmd = XEN_SYSCTL_nvdimm_op; +nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_setup; +nvdimm->pad = 0; +nvdimm->err = 0; +setup->smfn = smfn; +setup->emfn = emfn; +setup->mgmt_smfn = mgmt_smfn; +setup->mgmt_emfn = mgmt_emfn; +} + +int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch, + unsigned long smfn, unsigned long emfn) +{ +DECLARE_SYSCTL; +int rc; + +xc_nvdimm_pmem_setup_common(, smfn, emfn, smfn, emfn); +sysctl.u.nvdimm.u.pmem_setup.type = PMEM_REGION_TYPE_MGMT; + +rc = do_sysctl(xch, ); +if ( rc && sysctl.u.nvdimm.err ) +rc = -sysctl.u.nvdimm.err; + +return rc; +} + /* * Local variables: * mode: C diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 2fdf609805..93ccf198c9 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -2341,7 +2341,8 @@ void put_page(struct page_info *page) if ( unlikely((nx & PGC_count_mask) == 0) ) { -if ( cleanup_page_cacheattr(page) == 0 ) +if ( !is_pmem_page(page) /* PMEM page is not allocated from Xen heap. */ + && cleanup_page_cacheattr(page) == 0 ) free_domheap_page(page); else gdprintk(XENLOG_WARNING, diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c index d92307ca0b..7dbc5e966c 100644 --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -1535,6 +1535,78 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm) return ret; } +#ifdef CONFIG_NVDIMM_PMEM + +static void pmem_init_frame_table(unsigned long smfn, unsigned long emfn) +{ +struct page_info *page = mfn_to_page(smfn), *epage = mfn_to_page(emfn); + +while ( page < epage ) +{ +page->count_info = PGC_state_free | PGC_pmem_page; +page++; +} +} + +/** + * Initialize frametable and M2P for the specified PMEM region. + * + * Parameters: + * smfn, emfn: the start and end MFN of the PMEM region + * mgmt_smfn, + * mgmt_emfn: the start and end MFN of the PMEM region used t
[Xen-devel] [RFC XEN PATCH v3 04/39] xen/common: add Kconfig item for pmem support
Add CONFIG_PMEM to enable NVDIMM persistent memory support. By default, it's N. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: George Dunlap <george.dun...@eu.citrix.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Jan Beulich <jbeul...@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Cc: Stefano Stabellini <sstabell...@kernel.org> Cc: Tim Deegan <t...@xen.org> Cc: Wei Liu <wei.l...@citrix.com> --- xen/common/Kconfig | 8 1 file changed, 8 insertions(+) diff --git a/xen/common/Kconfig b/xen/common/Kconfig index dc8e876439..d4565b1c7b 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -279,4 +279,12 @@ config CMDLINE_OVERRIDE This is used to work around broken bootloaders. This should be set to 'N' under normal conditions. + +config NVDIMM_PMEM + bool "Persistent memory support" + default n + ---help--- + Enable support for NVDIMM in the persistent memory mode. + + If unsure, say N. endmenu -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 11/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions
XEN_SYSCTL_nvdimm_pmem_get_regions, which is a command of hypercall XEN_SYSCTL_nvdimm_op, is to get a list of PMEM regions of specified type (see PMEM_REGION_TYPE_*). Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- tools/libxc/include/xenctrl.h | 18 tools/libxc/xc_misc.c | 63 xen/common/pmem.c | 67 +++ xen/include/public/sysctl.h | 27 + 4 files changed, 175 insertions(+) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index e4d26967ba..d750e67460 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2587,6 +2587,24 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t domid, int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr); +/* + * Get an array of information of PMEM regions of the specified type. + * + * Parameters: + * xch:xc interface handle + * type: the type of PMEM regions, must be one of PMEM_REGION_TYPE_* + * buffer: the buffer where the information of PMEM regions is returned, + * the caller should allocate enough memory for it. + * nr :IN: the maximum number of PMEM regions that can be returned + * in @buffer + * OUT: the actual number of returned PMEM regions in @buffer + * + * Return: + * On success, return 0. Otherwise, return a non-zero error code. + */ +int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type, + void *buffer, uint32_t *nr); + /* Compat shims */ #include "xenctrl_compat.h" diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index fa66410869..f9ce802eda 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -912,6 +912,69 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr) return rc; } +int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type, + void *buffer, uint32_t *nr) +{ +DECLARE_SYSCTL; +DECLARE_HYPERCALL_BOUNCE(buffer, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT); + +xen_sysctl_nvdimm_op_t *nvdimm = +xen_sysctl_nvdimm_pmem_regions_t *regions = >u.pmem_regions; +unsigned int max; +unsigned long size; +int rc; + +if ( !buffer || !nr ) +return -EINVAL; + +max = *nr; +if ( !max ) +return 0; + +switch ( type ) +{ +case PMEM_REGION_TYPE_RAW: +size = sizeof(xen_sysctl_nvdimm_pmem_raw_region_t) * max; +break; + +default: +return -EINVAL; +} + +HYPERCALL_BOUNCE_SET_SIZE(buffer, size); +if ( xc_hypercall_bounce_pre(xch, buffer) ) +return -EFAULT; + +sysctl.cmd = XEN_SYSCTL_nvdimm_op; +nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_get_regions; +nvdimm->pad = 0; +nvdimm->err = 0; +regions->type = type; +regions->num_regions = max; + +switch ( type ) +{ +case PMEM_REGION_TYPE_RAW: +set_xen_guest_handle(regions->u_buffer.raw_regions, buffer); +break; + +default: +rc = -EINVAL; +goto out; +} + +rc = do_sysctl(xch, ); +if ( !rc ) +*nr = regions->num_regions; +else if ( nvdimm->err ) +rc = -nvdimm->err; + +out: +xc_hypercall_bounce_post(xch, buffer); + +return rc; +} + /* * Local variables: * mode: C diff --git a/xen/common/pmem.c b/xen/common/pmem.c index 995dfcb867..a737e7dc71 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -22,6 +22,8 @@ #include #include +#include + /* * All PMEM regions presenting in NFIT SPA range structures are linked * in this list. @@ -122,6 +124,67 @@ static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr) return rc; } +static int pmem_get_raw_regions( +XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) regions, +unsigned int *num_regions) +{ +struct list_head *cur; +unsigned int nr = 0, max = *num_regions; +xen_sysctl_nvdimm_pmem_raw_region_t region; +int rc = 0; + +if ( !guest_handle_okay(regions, max * sizeof(region)) ) +return -EINVAL; + +list_for_each(cur, _raw_regions) +{ +struct pmem *pmem = list_entry(cur, struct pmem, link); + +if ( nr >= max ) +break; + +region.smfn = pmem->smfn; +region.emfn = pmem->emfn; +region.pxm = pmem->u.raw.pxm; + +if ( copy_to_guest_offset(regions, nr, , 1) ) +{ +rc = -EFAULT; +break; +} + +nr++; +} + +*num_regions = nr; + +return rc; +} + +static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions
[Xen-devel] [RFC XEN PATCH v3 25/39] tools/xen-ndctl: add option '--data' to command 'list'
If the option '--data' is present, the command 'list' will list all PMEM regions for guest data usage. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/misc/xen-ndctl.c | 40 ++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c index 320633ae05..33817863ca 100644 --- a/tools/misc/xen-ndctl.c +++ b/tools/misc/xen-ndctl.c @@ -58,10 +58,11 @@ static const struct xen_ndctl_cmd { .name= "list", -.syntax = "[--all | --raw | --mgmt]", +.syntax = "[--all | --raw | --mgmt | --data]", .help= "--all: the default option, list all PMEM regions of following types.\n" "--raw: list all PMEM regions detected by Xen hypervisor.\n" - "--mgmt: list all PMEM regions for management usage.\n", + "--mgmt: list all PMEM regions for management usage.\n" + "--data: list all PMEM regions that can be mapped to guest.\n", .handler = handle_list, .need_xc = true, }, @@ -209,6 +210,40 @@ static int handle_list_mgmt(void) return rc; } +static int handle_list_data(void) +{ +int rc; +unsigned int nr = 0, i; +xen_sysctl_nvdimm_pmem_data_region_t *data_list; + +rc = xc_nvdimm_pmem_get_regions_nr(xch, PMEM_REGION_TYPE_DATA, ); +if ( rc ) +{ +fprintf(stderr, "Cannot get the number of PMEM regions: %s.\n", +strerror(-rc)); +return rc; +} + +data_list = malloc(nr * sizeof(*data_list)); +if ( !data_list ) +return -ENOMEM; + +rc = xc_nvdimm_pmem_get_regions(xch, PMEM_REGION_TYPE_DATA, data_list, ); +if ( rc ) +goto out; + +printf("Data PMEM regions:\n"); +for ( i = 0; i < nr; i++ ) +printf(" %u: MFN 0x%lx - 0x%lx, MGMT MFN 0x%lx - 0x%lx\n", + i, data_list[i].smfn, data_list[i].emfn, + data_list[i].mgmt_smfn, data_list[i].mgmt_emfn); + + out: +free(data_list); + +return rc; +} + static const struct list_handlers { const char *option; int (*handler)(void); @@ -216,6 +251,7 @@ static const struct list_handlers { { { "--raw", handle_list_raw }, { "--mgmt", handle_list_mgmt }, +{ "--data", handle_list_data }, }; static const unsigned int nr_list_hndrs = -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 22/39] tools/xen-ndctl: add command 'setup-data'
This command is to query Xen hypervisor to setup the specified PMEM range for guest data usage. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/misc/xen-ndctl.c | 36 1 file changed, 36 insertions(+) diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c index 058f8ccaf5..320633ae05 100644 --- a/tools/misc/xen-ndctl.c +++ b/tools/misc/xen-ndctl.c @@ -37,6 +37,7 @@ static int handle_help(int argc, char *argv[]); static int handle_list(int argc, char *argv[]); static int handle_list_cmds(int argc, char *argv[]); static int handle_setup_mgmt(int argc, char *argv[]); +static int handle_setup_data(int argc, char *argv[]); static const struct xen_ndctl_cmd { @@ -72,6 +73,18 @@ static const struct xen_ndctl_cmd .handler = handle_list_cmds, }, +{ +.name= "setup-data", +.syntax = " ", +.help= "Setup a PMEM region from MFN 'smfn' to 'emfn' for guest data usage,\n" + "which can be used as the backend of the virtual NVDIMM devices.\n\n" + "PMEM pages from MFN 'mgmt_smfn' to 'mgmt_emfn' is used to manage\n" + "the above PMEM region, and should not overlap with MFN from 'smfn'\n" + "to 'emfn'.\n", +.handler = handle_setup_data, +.need_xc = true, +}, + { .name= "setup-mgmt", .syntax = " ", @@ -277,6 +290,29 @@ static int handle_setup_mgmt(int argc, char **argv) return xc_nvdimm_pmem_setup_mgmt(xch, smfn, emfn); } +static int handle_setup_data(int argc, char **argv) +{ +unsigned long smfn, emfn, mgmt_smfn, mgmt_emfn; + +if ( argc < 5 ) +{ +fprintf(stderr, "Too few arguments.\n\n"); +show_help(argv[0]); +return -EINVAL; +} + +if ( !string_to_mfn(argv[1], ) || + !string_to_mfn(argv[2], ) || + !string_to_mfn(argv[3], _smfn) || + !string_to_mfn(argv[4], _emfn) ) +return -EINVAL; + +if ( argc > 5 ) +return handle_unrecognized_argument(argv[0], argv[5]); + +return xc_nvdimm_pmem_setup_data(xch, smfn, emfn, mgmt_smfn, mgmt_emfn); +} + int main(int argc, char *argv[]) { unsigned int i; -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 23/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr
Allow XEN_SYSCTL_nvdimm_pmem_get_regions_nr to return the number of data PMEM regions. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- tools/libxc/xc_misc.c | 3 ++- xen/common/pmem.c | 4 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index ef2e9e0656..db74df853a 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -896,7 +896,8 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr) if ( !nr || (type != PMEM_REGION_TYPE_RAW && - type != PMEM_REGION_TYPE_MGMT) ) + type != PMEM_REGION_TYPE_MGMT && + type != PMEM_REGION_TYPE_DATA) ) return -EINVAL; sysctl.cmd = XEN_SYSCTL_nvdimm_op; diff --git a/xen/common/pmem.c b/xen/common/pmem.c index 6891ed7a47..cbe557c220 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -162,6 +162,10 @@ static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr) regions_nr->num_regions = nr_mgmt_regions; break; +case PMEM_REGION_TYPE_DATA: +regions_nr->num_regions = nr_data_regions; +break; + default: rc = -EINVAL; } -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'
The kernel NVDIMM driver and the traditional NVDIMM management utilities in Dom0 does not work now. 'xen-ndctl' is added as an alternatively, which manages NVDIMM via Xen hypercalls. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- .gitignore | 1 + tools/misc/Makefile| 4 ++ tools/misc/xen-ndctl.c | 172 + 3 files changed, 177 insertions(+) create mode 100644 tools/misc/xen-ndctl.c diff --git a/.gitignore b/.gitignore index ecb198f914..30655673f7 100644 --- a/.gitignore +++ b/.gitignore @@ -216,6 +216,7 @@ tools/misc/xen-hvmctx tools/misc/xenlockprof tools/misc/lowmemd tools/misc/xencov +tools/misc/xen-ndctl tools/pkg-config/* tools/qemu-xen-build tools/xentrace/xenalyze diff --git a/tools/misc/Makefile b/tools/misc/Makefile index eaa28793ef..124775b7f4 100644 --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -32,6 +32,7 @@ INSTALL_SBIN += xenpm INSTALL_SBIN += xenwatchdogd INSTALL_SBIN += xen-livepatch INSTALL_SBIN += xen-diag +INSTALL_SBIN += xen-ndctl INSTALL_SBIN += $(INSTALL_SBIN-y) # Everything to be installed in a private bin/ @@ -118,4 +119,7 @@ xen-lowmemd: xen-lowmemd.o xencov: xencov.o $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) +xen-ndctl: xen-ndctl.o + $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) + -include $(DEPS_INCLUDE) diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c new file mode 100644 index 00..de40e29ff6 --- /dev/null +++ b/tools/misc/xen-ndctl.c @@ -0,0 +1,172 @@ +/* + * xen-ndctl.c + * + * Xen NVDIMM management tool + * + * Copyright (C) 2017, Intel Corporation + * + * Permission is hereby granted, free of charge, to any person + * obtaining a copy of this software and associated documentation + * files (the "Software"), to deal in the Software without restriction, + * including without limitation the rights to use, copy, modify, merge, + * publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, + * subject to the following conditions: + * + * The above copyright notice and this permission notice shall be + * included in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY + * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + */ + +#include +#include +#include +#include + +static xc_interface *xch; + +static int handle_help(int argc, char *argv[]); +static int handle_list_cmds(int argc, char *argv[]); + +static const struct xen_ndctl_cmd +{ +const char *name; +const char *syntax; +const char *help; +int (*handler)(int argc, char **argv); +bool need_xc; +} cmds[] = +{ +{ +.name= "help", +.syntax = "[command]", +.help= "Show this message or the help message of 'command'.\n" + "Use command 'list-cmds' to list all supported commands.\n", +.handler = handle_help, +}, + +{ +.name= "list-cmds", +.syntax = "", +.help= "List all supported commands.\n", +.handler = handle_list_cmds, +}, +}; + +static const unsigned int nr_cmds = sizeof(cmds) / sizeof(cmds[0]); + +static void show_help(const char *cmd) +{ +unsigned int i; + +if ( !cmd ) +{ +fprintf(stderr, +"Usage: xen-ndctl [args]\n\n" +"List all supported commands by 'xen-ndctl list-cmds'.\n" +"Get help of a command by 'xen-ndctl help '.\n"); +return; +} + +for ( i = 0; i < nr_cmds; i++ ) +if ( !strcmp(cmd, cmds[i].name) ) +{ +fprintf(stderr, "Usage: xen-ndctl %s %s\n\n%s", +cmds[i].name, cmds[i].syntax, cmds[i].help); +break; +} + +if ( i == nr_cmds ) +fprintf(stderr, "Unsupported command '%s'.\n" +"List all supported commands by 'xen-ndctl list-cmds'.\n", +cmd); +} + +static int handle_unrecognized_argument(const char *cmd, const char *argv) +{ +fprintf(stderr, "Unrecognized argument: %s.\n\n", argv); +show_help(cmd); + +return -EINVAL
[Xen-devel] [RFC XEN PATCH v3 26/39] xen/pmem: add function to map PMEM pages to HVM domain
pmem_populate() is added to map the specifed data PMEM pages to a HVM domain. No called is added in this commit. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- xen/common/domain.c | 3 ++ xen/common/pmem.c | 141 xen/include/xen/pmem.h | 19 +++ xen/include/xen/sched.h | 3 ++ 4 files changed, 166 insertions(+) diff --git a/xen/common/domain.c b/xen/common/domain.c index 5aebcf265f..4354342b02 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -290,6 +290,9 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags, INIT_PAGE_LIST_HEAD(>page_list); INIT_PAGE_LIST_HEAD(>xenpage_list); +spin_lock_init(>pmem_lock); +INIT_PAGE_LIST_HEAD(>pmem_page_list); + spin_lock_init(>node_affinity_lock); d->node_affinity = NODE_MASK_ALL; d->auto_node_affinity = 1; diff --git a/xen/common/pmem.c b/xen/common/pmem.c index ed4a014c30..2f9ad64a26 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -17,10 +17,12 @@ */ #include +#include #include #include #include #include +#include #include @@ -78,6 +80,31 @@ static bool check_overlap(unsigned long smfn1, unsigned long emfn1, (emfn1 > smfn2 && emfn1 <= emfn2); } +static bool check_cover(struct list_head *list, +unsigned long smfn, unsigned long emfn) +{ +struct list_head *cur; +struct pmem *pmem; +unsigned long pmem_smfn, pmem_emfn; + +list_for_each(cur, list) +{ +pmem = list_entry(cur, struct pmem, link); +pmem_smfn = pmem->smfn; +pmem_emfn = pmem->emfn; + +if ( smfn < pmem_smfn ) +return false; + +if ( emfn <= pmem_emfn ) +return true; + +smfn = max(smfn, pmem_emfn); +} + +return false; +} + /** * Add a PMEM region to a list. All PMEM regions in the list are * sorted in the ascending order of the start address. A PMEM region, @@ -600,6 +627,120 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm) #ifdef CONFIG_X86 +static int pmem_assign_page(struct domain *d, struct page_info *pg, +unsigned long gfn) +{ +int rc; + +if ( pg->count_info != (PGC_state_free | PGC_pmem_page) ) +return -EBUSY; + +pg->count_info = PGC_allocated | PGC_state_inuse | PGC_pmem_page | 1; +pg->u.inuse.type_info = 0; +page_set_owner(pg, d); + +rc = guest_physmap_add_page(d, _gfn(gfn), _mfn(page_to_mfn(pg)), 0); +if ( rc ) +{ +page_set_owner(pg, NULL); +pg->count_info = PGC_state_free | PGC_pmem_page; + +return rc; +} + +spin_lock(>pmem_lock); +page_list_add_tail(pg, >pmem_page_list); +spin_unlock(>pmem_lock); + +return 0; +} + +static int pmem_unassign_page(struct domain *d, struct page_info *pg, + unsigned long gfn) +{ +int rc; + +spin_lock(>pmem_lock); +page_list_del(pg, >pmem_page_list); +spin_unlock(>pmem_lock); + +rc = guest_physmap_remove_page(d, _gfn(gfn), _mfn(page_to_mfn(pg)), 0); + +page_set_owner(pg, NULL); +pg->count_info = PGC_state_free | PGC_pmem_page; + +return 0; +} + +int pmem_populate(struct xen_pmem_map_args *args) +{ +struct domain *d = args->domain; +unsigned long i = args->nr_done; +unsigned long mfn = args->mfn + i; +unsigned long emfn = args->mfn + args->nr_mfns; +unsigned long gfn = args->gfn + i; +struct page_info *page; +int rc = 0, err = 0; + +if ( unlikely(d->is_dying) ) +return -EINVAL; + +if ( !is_hvm_domain(d) ) +return -EINVAL; + +spin_lock(_data_lock); + +if ( !check_cover(_data_regions, mfn, emfn) ) +{ +rc = -ENXIO; +goto out; +} + +for ( ; mfn < emfn; i++, mfn++, gfn++ ) +{ +if ( i != args->nr_done && hypercall_preempt_check() ) +{ +args->preempted = 1; +rc = -ERESTART; +break; +} + +page = mfn_to_page(mfn); +if ( !page_state_is(page, free) ) +{ +rc = -EBUSY; +break; +} + +rc = pmem_assign_page(d, page, gfn); +if ( rc ) +break; +} + + out: +if ( rc && rc != -ERESTART ) +while ( i-- && !err ) +err = pmem_unassign_page(d, mfn_to_page(--mfn), --gfn); + +spin_unlock(_data_lock); + +if ( unlikely(err) ) +{ +/* + * If we unfortunately fails to recover from the previous + * failure, some PMEM pages may still be mapped to the + * domain. As pmem_populate() is now called only during domain + * creation, let's crash the domain. +
[Xen-devel] [RFC XEN PATCH v3 03/39] x86_64/mm: avoid cleaning the unmapped frame table
cleanup_frame_table() initializes the entire newly added frame table to all -1's. If it's called after extend_frame_table() failed to map the entire frame table, the initialization will hit a page fault. Move the cleanup of partially mapped frametable to extend_frame_table(), which has enough knowledge of the mapping status. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/x86_64/mm.c | 51 ++-- 1 file changed, 28 insertions(+), 23 deletions(-) diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c index c93383d7d9..f635e4bf70 100644 --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -710,15 +710,12 @@ void free_compat_arg_xlat(struct vcpu *v) PFN_UP(COMPAT_ARG_XLAT_SIZE)); } -static void cleanup_frame_table(struct mem_hotadd_info *info) +static void cleanup_frame_table(unsigned long spfn, unsigned long epfn) { +struct mem_hotadd_info info = { .spfn = spfn, .epfn = epfn, .cur = spfn }; unsigned long sva, eva; l3_pgentry_t l3e; l2_pgentry_t l2e; -unsigned long spfn, epfn; - -spfn = info->spfn; -epfn = info->epfn; sva = (unsigned long)mfn_to_page(spfn); eva = (unsigned long)mfn_to_page(epfn); @@ -744,7 +741,7 @@ static void cleanup_frame_table(struct mem_hotadd_info *info) if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) == (_PAGE_PSE | _PAGE_PRESENT) ) { -if (hotadd_mem_valid(l2e_get_pfn(l2e), info)) +if ( hotadd_mem_valid(l2e_get_pfn(l2e), ) ) destroy_xen_mappings(sva & ~((1UL << L2_PAGETABLE_SHIFT) - 1), ((sva & ~((1UL << L2_PAGETABLE_SHIFT) -1 )) + (1UL << L2_PAGETABLE_SHIFT) - 1)); @@ -769,28 +766,33 @@ static int setup_frametable_chunk(void *start, void *end, { unsigned long s = (unsigned long)start; unsigned long e = (unsigned long)end; -unsigned long mfn; -int err; +unsigned long cur, mfn; +int err = 0; ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1))); ASSERT(!(e & ((1 << L2_PAGETABLE_SHIFT) - 1))); -for ( ; s < e; s += (1UL << L2_PAGETABLE_SHIFT)) +for ( cur = s; cur < e; cur += (1UL << L2_PAGETABLE_SHIFT) ) { mfn = alloc_hotadd_mfn(info); -err = map_pages_to_xen(s, mfn, 1UL << PAGETABLE_ORDER, +err = map_pages_to_xen(cur, mfn, 1UL << PAGETABLE_ORDER, PAGE_HYPERVISOR); if ( err ) -return err; +break; } -memset(start, -1, s - (unsigned long)start); -return 0; +if ( !err ) +memset(start, -1, cur - s); +else +destroy_xen_mappings(s, cur); + +return err; } static int extend_frame_table(struct mem_hotadd_info *info) { unsigned long cidx, nidx, eidx, spfn, epfn; +int err = 0; spfn = info->spfn; epfn = info->epfn; @@ -809,8 +811,6 @@ static int extend_frame_table(struct mem_hotadd_info *info) while ( cidx < eidx ) { -int err; - nidx = find_next_bit(pdx_group_valid, eidx, cidx); if ( nidx >= eidx ) nidx = eidx; @@ -818,14 +818,19 @@ static int extend_frame_table(struct mem_hotadd_info *info) pdx_to_page(nidx * PDX_GROUP_COUNT), info); if ( err ) -return err; +break; cidx = find_next_zero_bit(pdx_group_valid, eidx, nidx); } -memset(mfn_to_page(spfn), 0, - (unsigned long)mfn_to_page(epfn) - (unsigned long)mfn_to_page(spfn)); -return 0; +if ( !err ) +memset(mfn_to_page(spfn), 0, + (unsigned long)mfn_to_page(epfn) - + (unsigned long)mfn_to_page(spfn)); +else +cleanup_frame_table(spfn, pdx_to_pfn(cidx * PDX_GROUP_COUNT)); + +return err; } void __init subarch_init_memory(void) @@ -1404,8 +1409,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm) info.cur = spfn; ret = extend_frame_table(); -if (ret) -goto destroy_frametable; +if ( ret ) +goto restore_node_status; /* Set max_page as setup_m2p_table will use it*/ if (max_page < epfn) @@ -1448,8 +1453,8 @@ destroy_m2p: max_page = old_max; total_pages = old_total; max_pdx = pfn_to_pdx(max_page - 1) + 1; -destroy_frametable: -cleanup_frame_table(); +cleanup_frame_table(spfn, epfn); +restore_node_status: if ( !orig_online ) node_set_offline(node); NODE_DATA(node)->node_start_pfn = old_node_start; -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 07/39] xen/pmem: register valid PMEM regions to Xen hypervisor
Register valid PMEM regions probed via NFIT to Xen hypervisor. No frametable and M2P table are created for those PMEM regions at this stage. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- xen/common/Makefile | 1 + xen/common/pmem.c | 130 xen/drivers/acpi/nfit.c | 12 - xen/include/xen/pmem.h | 28 +++ 4 files changed, 170 insertions(+), 1 deletion(-) create mode 100644 xen/common/pmem.c create mode 100644 xen/include/xen/pmem.h diff --git a/xen/common/Makefile b/xen/common/Makefile index 39e2614546..46f9d1f57f 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -29,6 +29,7 @@ obj-y += notifier.o obj-y += page_alloc.o obj-$(CONFIG_HAS_PDX) += pdx.o obj-$(CONFIG_PERF_COUNTERS) += perfc.o +obj-${CONFIG_NVDIMM_PMEM} += pmem.o obj-y += preempt.o obj-y += random.o obj-y += rangeset.o diff --git a/xen/common/pmem.c b/xen/common/pmem.c new file mode 100644 index 00..49648222a6 --- /dev/null +++ b/xen/common/pmem.c @@ -0,0 +1,130 @@ +/* + * xen/common/pmem.c + * + * Copyright (C) 2017, Intel Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms and conditions of the GNU General Public + * License, version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include +#include +#include + +/* + * All PMEM regions presenting in NFIT SPA range structures are linked + * in this list. + */ +static LIST_HEAD(pmem_raw_regions); +static unsigned int nr_raw_regions; + +struct pmem { +struct list_head link; /* link to one of PMEM region list */ +unsigned long smfn;/* start MFN of the PMEM region */ +unsigned long emfn;/* end MFN of the PMEM region */ + +union { +struct { +unsigned int pxm; /* proximity domain of the PMEM region */ +} raw; +} u; +}; + +static bool check_overlap(unsigned long smfn1, unsigned long emfn1, + unsigned long smfn2, unsigned long emfn2) +{ +return (smfn1 >= smfn2 && smfn1 < emfn2) || + (emfn1 > smfn2 && emfn1 <= emfn2); +} + +/** + * Add a PMEM region to a list. All PMEM regions in the list are + * sorted in the ascending order of the start address. A PMEM region, + * whose range is overlapped with anyone in the list, cannot be added + * to the list. + * + * Parameters: + * list: the list to which a new PMEM region will be added + * smfn, emfn: the range of the new PMEM region + * entry: return the new entry added to the list + * + * Return: + * On success, return 0 and the new entry added to the list is + * returned via @entry. Otherwise, return an error number and the + * value of @entry is undefined. + */ +static int pmem_list_add(struct list_head *list, + unsigned long smfn, unsigned long emfn, + struct pmem **entry) +{ +struct list_head *cur; +struct pmem *new_pmem; +int rc = 0; + +list_for_each_prev(cur, list) +{ +struct pmem *cur_pmem = list_entry(cur, struct pmem, link); +unsigned long cur_smfn = cur_pmem->smfn; +unsigned long cur_emfn = cur_pmem->emfn; + +if ( check_overlap(smfn, emfn, cur_smfn, cur_emfn) ) +{ +rc = -EEXIST; +goto out; +} + +if ( cur_smfn < smfn ) +break; +} + +new_pmem = xzalloc(struct pmem); +if ( !new_pmem ) +{ +rc = -ENOMEM; +goto out; +} +new_pmem->smfn = smfn; +new_pmem->emfn = emfn; +list_add(_pmem->link, cur); + + out: +if ( !rc && entry ) +*entry = new_pmem; + +return rc; +} + +/** + * Register a pmem region to Xen. + * + * Parameters: + * smfn, emfn: start and end MFNs of the pmem region + * pxm:the proximity domain of the pmem region + * + * Return: + * On success, return 0. Otherwise, an error number is returned. + */ +int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm) +{ +int rc; +struct pmem *pmem; + +if ( smfn >= emfn ) +return -EINVAL; + +rc = pmem_list_add(_raw_regions, smfn, emfn, ); +if ( !rc ) +pmem->u.raw.pxm = pxm; +nr_raw_regions++; + +return rc; +} diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c index b88a587b8d..68750c2edc 100644 --- a/xen/drivers/acpi/nfit.c +++ b/xen/drivers/acpi
[Xen-devel] [RFC XEN PATCH v3 19/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions
Allow XEN_SYSCTL_nvdimm_pmem_get_regions to return a list of management PMEM regions. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- tools/libxc/xc_misc.c | 8 xen/common/pmem.c | 45 + xen/include/public/sysctl.h | 11 +++ 3 files changed, 64 insertions(+) diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index 4b5558aaa5..3ad254f5ae 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -939,6 +939,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type, size = sizeof(xen_sysctl_nvdimm_pmem_raw_region_t) * max; break; +case PMEM_REGION_TYPE_MGMT: +size = sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t) * max; +break; + default: return -EINVAL; } @@ -960,6 +964,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type, set_xen_guest_handle(regions->u_buffer.raw_regions, buffer); break; +case PMEM_REGION_TYPE_MGMT: +set_xen_guest_handle(regions->u_buffer.mgmt_regions, buffer); +break; + default: rc = -EINVAL; goto out; diff --git a/xen/common/pmem.c b/xen/common/pmem.c index 54b3e7119a..dcd8160407 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -190,6 +190,47 @@ static int pmem_get_raw_regions( return rc; } +static int pmem_get_mgmt_regions( +XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) regions, +unsigned int *num_regions) +{ +struct list_head *cur; +unsigned int nr = 0, max = *num_regions; +xen_sysctl_nvdimm_pmem_mgmt_region_t region; +int rc = 0; + +if ( !guest_handle_okay(regions, max * sizeof(region)) ) +return -EINVAL; + +spin_lock(_mgmt_lock); + +list_for_each(cur, _mgmt_regions) +{ +struct pmem *pmem = list_entry(cur, struct pmem, link); + +if ( nr >= max ) +break; + +region.smfn = pmem->smfn; +region.emfn = pmem->emfn; +region.used_mfns = pmem->u.mgmt.used; + +if ( copy_to_guest_offset(regions, nr, , 1) ) +{ +rc = -EFAULT; +break; +} + +nr++; +} + +spin_unlock(_mgmt_lock); + +*num_regions = nr; + +return rc; +} + static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions) { unsigned int type = regions->type, max = regions->num_regions; @@ -204,6 +245,10 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions) rc = pmem_get_raw_regions(regions->u_buffer.raw_regions, ); break; +case PMEM_REGION_TYPE_MGMT: +rc = pmem_get_mgmt_regions(regions->u_buffer.mgmt_regions, ); +break; + default: rc = -EINVAL; } diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index 5d208033a0..f825716446 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -1131,6 +1131,15 @@ struct xen_sysctl_nvdimm_pmem_raw_region { typedef struct xen_sysctl_nvdimm_pmem_raw_region xen_sysctl_nvdimm_pmem_raw_region_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_raw_region_t); +/* PMEM_REGION_TYPE_MGMT */ +struct xen_sysctl_nvdimm_pmem_mgmt_region { +uint64_t smfn; +uint64_t emfn; +uint64_t used_mfns; +}; +typedef struct xen_sysctl_nvdimm_pmem_mgmt_region xen_sysctl_nvdimm_pmem_mgmt_region_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_mgmt_region_t); + /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */ struct xen_sysctl_nvdimm_pmem_regions_nr { uint8_t type; /* IN: one of PMEM_REGION_TYPE_* */ @@ -1149,6 +1158,8 @@ struct xen_sysctl_nvdimm_pmem_regions { union { /* if type == PMEM_REGION_TYPE_RAW */ XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions; +/* if type == PMEM_REGION_TYPE_MGMT */ +XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) mgmt_regions; } u_buffer; /* IN: the guest handler where the entries of PMEM regions of the type @type are returned */ }; -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()
The current check refuses the hot-plugged memory that falls in one unused PDX group, which should be allowed. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/x86_64/mm.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c index 11746730b4..6c5221f90c 100644 --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -1296,12 +1296,8 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn) return 0; /* Make sure the new range is not present now */ -sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1) & ~(PDX_GROUP_COUNT - 1)) -/ PDX_GROUP_COUNT; +sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT; eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT; -if (sidx >= eidx) -return 0; - s = find_next_zero_bit(pdx_group_valid, eidx, sidx); if ( s > eidx ) return 0; -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 10/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr
XEN_SYSCTL_nvdimm_pmem_get_rgions_nr, which is a command of hypercall XEN_SYSCTL_nvdimm_op, is to get the number of PMEM regions of the specified type (see PMEM_REGION_TYPE_*). Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- tools/libxc/include/xenctrl.h | 15 +++ tools/libxc/xc_misc.c | 24 xen/common/pmem.c | 29 - xen/include/public/sysctl.h | 16 ++-- 4 files changed, 81 insertions(+), 3 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 43151cb415..e4d26967ba 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2572,6 +2572,21 @@ int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout); int xc_domain_cacheflush(xc_interface *xch, uint32_t domid, xen_pfn_t start_pfn, xen_pfn_t nr_pfns); +/* + * Get the number of PMEM regions of the specified type. + * + * Parameters: + * xch: xc interface handle + * type: the type of PMEM regions, must be one of PMEM_REGION_TYPE_* + * nr: the number of PMEM regions is returned via this parameter + * + * Return: + * On success, return 0 and the number of PMEM regions is returned via @nr. + * Otherwise, return a non-zero error code. + */ +int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, + uint8_t type, uint32_t *nr); + /* Compat shims */ #include "xenctrl_compat.h" diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index 7e15e904e3..fa66410869 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -888,6 +888,30 @@ int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout) return _xc_livepatch_action(xch, name, LIVEPATCH_ACTION_REPLACE, timeout); } +int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t *nr) +{ +DECLARE_SYSCTL; +xen_sysctl_nvdimm_op_t *nvdimm = +int rc; + +if ( !nr || type != PMEM_REGION_TYPE_RAW ) +return -EINVAL; + +sysctl.cmd = XEN_SYSCTL_nvdimm_op; +nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_get_regions_nr; +nvdimm->pad = 0; +nvdimm->u.pmem_regions_nr.type = type; +nvdimm->err = 0; + +rc = do_sysctl(xch, ); +if ( !rc ) +*nr = nvdimm->u.pmem_regions_nr.num_regions; +else if ( nvdimm->err ) +rc = nvdimm->err; + +return rc; +} + /* * Local variables: * mode: C diff --git a/xen/common/pmem.c b/xen/common/pmem.c index d67f237cd5..995dfcb867 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -105,6 +105,23 @@ static int pmem_list_add(struct list_head *list, return rc; } +static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr) +{ +int rc = 0; + +switch ( regions_nr->type ) +{ +case PMEM_REGION_TYPE_RAW: +regions_nr->num_regions = nr_raw_regions; +break; + +default: +rc = -EINVAL; +} + +return rc; +} + /** * Register a pmem region to Xen. * @@ -142,7 +159,17 @@ int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm) */ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm) { -int rc = -ENOSYS; +int rc; + +switch ( nvdimm->cmd ) +{ +case XEN_SYSCTL_nvdimm_pmem_get_regions_nr: +rc = pmem_get_regions_nr(>u.pmem_regions_nr); +break; + +default: +rc = -ENOSYS; +} nvdimm->err = -rc; diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index e8272ae968..cf308bbc45 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -1118,11 +1118,23 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_set_parameter_t); * Interface for NVDIMM management. */ +/* Types of PMEM regions */ +#define PMEM_REGION_TYPE_RAW0 /* PMEM regions detected by Xen */ + +/* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */ +struct xen_sysctl_nvdimm_pmem_regions_nr { +uint8_t type; /* IN: one of PMEM_REGION_TYPE_* */ +uint32_t num_regions; /* OUT: the number of PMEM regions of type @type */ +}; +typedef struct xen_sysctl_nvdimm_pmem_regions_nr xen_sysctl_nvdimm_pmem_regions_nr_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_nr_t); + struct xen_sysctl_nvdimm_op { -uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */ +uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*. */ +#define XEN_SYSCTL_nvdimm_pmem_get_regions_nr 0 uint32_t pad; /* IN: Always zero. */ union { -/* Parameters of XEN_SYSCTL_nvdimm_* will be added here. */ +xen_sysctl_nvdimm_pmem_regions_nr_t pmem_regions_nr; } u; uint32_t err; /* OUT: error code */ }; -- 2.14.1 __
[Xen-devel] [RFC XEN PATCH v3 21/39] xen/pmem: support setup PMEM region for guest data usage
Allow the command XEN_SYSCTL_nvdimm_pmem_setup of hypercall XEN_SYSCTL_nvdimm_op to setup a PMEM region for guest data usage. After the setup, that PMEM region will be able to be mapped to guest address space. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- tools/libxc/include/xenctrl.h | 22 tools/libxc/xc_misc.c | 17 ++ xen/common/pmem.c | 118 +- xen/include/public/sysctl.h | 3 +- 4 files changed, 157 insertions(+), 3 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 7c5707fe11..41e5e3408c 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2621,6 +2621,28 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type, int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch, unsigned long smfn, unsigned long emfn); +/* + * Setup the specified PMEM pages for guest data usage. If success, + * these PMEM page can be mapped to guest and be used as the backend + * of vNDIMM devices. + * + * Parameters: + * xch:xc interface handle + * smfn, emfn: the start and end of the PMEM region + * mgmt_smfn, + + * mgmt_emfn: the start and the end MFN of the PMEM region that is + * used to manage this PMEM region. It must be in one of + * those added by xc_nvdimm_pmem_setup_mgmt() calls, and + * not overlap with @smfn - @emfn. + * + * Return: + * On success, return 0. Otherwise, return a non-zero error code. + */ +int xc_nvdimm_pmem_setup_data(xc_interface *xch, + unsigned long smfn, unsigned long emfn, + unsigned long mgmt_smfn, unsigned long mgmt_emfn); + /* Compat shims */ #include "xenctrl_compat.h" diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index 3ad254f5ae..ef2e9e0656 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -1019,6 +1019,23 @@ int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch, return rc; } +int xc_nvdimm_pmem_setup_data(xc_interface *xch, + unsigned long smfn, unsigned long emfn, + unsigned long mgmt_smfn, unsigned long mgmt_emfn) +{ +DECLARE_SYSCTL; +int rc; + +xc_nvdimm_pmem_setup_common(, smfn, emfn, mgmt_smfn, mgmt_emfn); +sysctl.u.nvdimm.u.pmem_setup.type = PMEM_REGION_TYPE_DATA; + +rc = do_sysctl(xch, ); +if ( rc && sysctl.u.nvdimm.err ) +rc = -sysctl.u.nvdimm.err; + +return rc; +} + /* * Local variables: * mode: C diff --git a/xen/common/pmem.c b/xen/common/pmem.c index dcd8160407..6891ed7a47 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -34,16 +34,26 @@ static unsigned int nr_raw_regions; /* * All PMEM regions reserved for management purpose are linked to this * list. All of them must be covered by one or multiple PMEM regions - * in list pmem_raw_regions. + * in list pmem_raw_regions, and not appear in list pmem_data_regions. */ static LIST_HEAD(pmem_mgmt_regions); static DEFINE_SPINLOCK(pmem_mgmt_lock); static unsigned int nr_mgmt_regions; +/* + * All PMEM regions that can be mapped to guest are linked to this + * list. All of them must be covered by one or multiple PMEM regions + * in list pmem_raw_regions, and not appear in list pmem_mgmt_regions. + */ +static LIST_HEAD(pmem_data_regions); +static DEFINE_SPINLOCK(pmem_data_lock); +static unsigned int nr_data_regions; + struct pmem { struct list_head link; /* link to one of PMEM region list */ unsigned long smfn;/* start MFN of the PMEM region */ unsigned long emfn;/* end MFN of the PMEM region */ +spinlock_t lock; union { struct { @@ -53,6 +63,11 @@ struct pmem { struct { unsigned long used; /* # of used pages in MGMT PMEM region */ } mgmt; + +struct { +unsigned long mgmt_smfn; /* start MFN of management region */ +unsigned long mgmt_emfn; /* end MFN of management region */ +} data; } u; }; @@ -111,6 +126,7 @@ static int pmem_list_add(struct list_head *list, } new_pmem->smfn = smfn; new_pmem->emfn = emfn; +spin_lock_init(_pmem->lock); list_add(_pmem->link, cur); out: @@ -261,9 +277,16 @@ static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions) static bool check_mgmt_size(unsigned long mgmt_mfns, unsigned long total_mfns) { -return mgmt_mfns >= +unsigned long required = ((sizeof(struct page_info) * total_mfns) >> PAGE_SHIFT) + ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT); + +if ( required > mgmt_mfns ) +printk(XEN
[Xen-devel] [RFC XEN PATCH v3 17/39] tools/xen-ndctl: add command 'setup-mgmt'
This command is to query Xen hypervisor to setup the specified PMEM range for the management usage. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/misc/xen-ndctl.c | 45 + 1 file changed, 45 insertions(+) diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c index 6277a1eda2..1289a83dbe 100644 --- a/tools/misc/xen-ndctl.c +++ b/tools/misc/xen-ndctl.c @@ -36,6 +36,7 @@ static xc_interface *xch; static int handle_help(int argc, char *argv[]); static int handle_list(int argc, char *argv[]); static int handle_list_cmds(int argc, char *argv[]); +static int handle_setup_mgmt(int argc, char *argv[]); static const struct xen_ndctl_cmd { @@ -69,6 +70,14 @@ static const struct xen_ndctl_cmd .help= "List all supported commands.\n", .handler = handle_list_cmds, }, + +{ +.name= "setup-mgmt", +.syntax = " ", +.help= "Setup a PMEM region from MFN 'smfn' to 'emfn' for management usage.\n\n", +.handler = handle_setup_mgmt, +.need_xc = true, +}, }; static const unsigned int nr_cmds = sizeof(cmds) / sizeof(cmds[0]); @@ -197,6 +206,42 @@ static int handle_list_cmds(int argc, char *argv[]) return 0; } +static bool string_to_mfn(const char *str, unsigned long *ret) +{ +unsigned long l; + +errno = 0; +l = strtoul(str, NULL, 0); + +if ( !errno ) +*ret = l; +else +fprintf(stderr, "Invalid MFN %s: %s\n", str, strerror(errno)); + +return !errno; +} + +static int handle_setup_mgmt(int argc, char **argv) +{ +unsigned long smfn, emfn; + +if ( argc < 3 ) +{ +fprintf(stderr, "Too few arguments.\n\n"); +show_help(argv[0]); +return -EINVAL; +} + +if ( !string_to_mfn(argv[1], ) || + !string_to_mfn(argv[2], ) ) +return -EINVAL; + +if ( argc > 3 ) +return handle_unrecognized_argument(argv[0], argv[3]); + +return xc_nvdimm_pmem_setup_mgmt(xch, smfn, emfn); +} + int main(int argc, char *argv[]) { unsigned int i; -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains
Overview == (RFC v2 can be found at https://lists.xen.org/archives/html/xen-devel/2017-03/msg02401.html) Well, this RFC v3 changes and inflates a lot from previous versions. The primary changes are listed below, most of which are to simplify the first implementation and avoid additional inflation. 1. Drop the support to maintain the frametable and M2P table of PMEM in RAM. In the future, we may add this support back. 2. Hide host NFIT and deny access to host PMEM from Dom0. In other words, the kernel NVDIMM driver is loaded in Dom 0 and existing management utilities (e.g. ndctl) do not work in Dom0 anymore. This is to workaround the inferences of PMEM access between Dom0 and Xen hypervisor. In the future, we may add a stub driver in Dom0 which will hold the PMEM pages being used by Xen hypervisor and/or other domains. 3. As there is no NVDIMM driver and management utilities in Dom0 now, we cannot easily specify an area of host NVDIMM (e.g., by /dev/pmem0) and manage NVDIMM in Dom0 (e.g., creating labels). Instead, we have to specify the exact MFNs of host PMEM pages in xl domain configuration files and the newly added Xen NVDIMM management utility xen-ndctl. If there are indeed some tasks that have to be handled by existing driver and management utilities, such as recovery from hardware failures, they have to be accomplished out of Xen environment. After 2. is solved in the future, we would be able to make existing driver and management utilities work in Dom0 again. All patches can be found at Xen: https://github.com/hzzhan9/xen.git nvdimm-rfc-v3 QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3 How to Test == 1. Build and install this patchset with the associated QEMU patches. 2. Use xen-ndctl to get a list of PMEM regions detected by Xen hypervisor, e.g. # xen-ndctl list --raw Raw PMEM regions: 0: MFN 0x48 - 0x88, PXM 3 which indicates a PMEM region is present at MFN 0x48 - 0x88. 3. Setup a management area to manage the guest data areas. # xen-ndctl setup-mgmt 0x48 0x4c # xen-ndctl list --mgmt Management PMEM regions: 0: MFN 0x48 - 0x4c, used 0xc00 The first command setup the PMEM area in MFN 0x48 - 0x4c (1GB) as a management area, which is also used to manage itself. The second command list all management areas, and 'used' field shows the number of pages has been used from the beginning of that area. The size ratio between a management area and areas that it manages (including itself) should be at least 1 : 100 (i.e., 32 bytes for frametable and 8 bytes for M2P table per page). The size of a management area as well as a data area below is currently restricted to 256 Mbytes or multiples. The alignment is restricted to 2 Mbytes or multiples. 4. Setup a data area that can be used by guest. # xen-ndctl setup-data 0x4c 0x88 0x480c00 0x4c # xen-ndctl list --data Data PMEM regions: 0: MFN 0x4c - 0x88, MGMT MFN 0x480c00 - 0x48b000 The first command setup the remaining PMEM pages from MFN 0x4c to 0x88 as a data area. The management area MFN from 0x480c00 to 0x4c is specified to manage this data area. The actual used management pages can be found by the second command. 5. Assign a data pages to a HVM domain by adding the following line in the domain configuration. vnvdimms = [ 'type=mfn, backend=0x4c, nr_pages=0x10' ] which assigns 4 Gbytes PMEM starting from MFN 0x4c to that domain. A 4 Gbytes PMEM should be present in guest (e.g., as /dev/pmem0) after above steps of setup. There can be one or multiple entries in vnvdimms, which do not overlap with each other. Sharing the PMEM pages between domains are not supported, so PMEM pages assigned to each domain should not overlap with each other. Patch Organization == This RFC v3 is composed of following 6 parts per the task they are going to solve. The tool stack patches are collected and separated into each part. - Part 0. Bug fix and code cleanup [01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check() [02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table() [03/39] x86_64/mm: avoid cleaning the unmapped frame table - Part 1. Detect host PMEM Detect host PMEM via NFIT. No frametable and M2P table for them are created in this part. [04/39] xen/common: add Kconfig item for pmem support [05/39] x86/mm: exclude PMEM regions from initial frametable [06/39] acpi: probe valid PMEM regions via NFIT [07/39] xen/pmem: register valid PMEM regions to Xen hypervisor [08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0 [09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op [10/39] xen/pmem: add
[Xen-devel] [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
XEN_SYSCTL_nvdimm_op will support a set of sub-commands to manage the physical NVDIMM devices. This commit just adds the framework for this hypercall, and does not implement any sub-commands. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Daniel De Graaf <dgde...@tycho.nsa.gov> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> --- tools/flask/policy/modules/dom0.te | 2 +- xen/common/pmem.c | 18 ++ xen/common/sysctl.c | 9 + xen/include/public/sysctl.h | 19 ++- xen/include/xen/pmem.h | 2 ++ xen/xsm/flask/hooks.c | 4 xen/xsm/flask/policy/access_vectors | 2 ++ 7 files changed, 54 insertions(+), 2 deletions(-) diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te index 338caaf41e..8a817b0b55 100644 --- a/tools/flask/policy/modules/dom0.te +++ b/tools/flask/policy/modules/dom0.te @@ -16,7 +16,7 @@ allow dom0_t xen_t:xen { allow dom0_t xen_t:xen2 { resource_op psr_cmt_op psr_cat_op pmu_ctrl get_symbol get_cpu_levelling_caps get_cpu_featureset livepatch_op - gcov_op set_parameter + gcov_op set_parameter nvdimm_op }; # Allow dom0 to use all XENVER_ subops that have checks. diff --git a/xen/common/pmem.c b/xen/common/pmem.c index c9f5f6e904..d67f237cd5 100644 --- a/xen/common/pmem.c +++ b/xen/common/pmem.c @@ -131,6 +131,24 @@ int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm) return rc; } +/** + * Top-level hypercall handler of XEN_SYSCTL_nvdimm_pmem_*. + * + * Parameters: + * nvdimm: the hypercall parameters + * + * Return: + * On success, return 0. Otherwise, return a non-zero error code. + */ +int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm) +{ +int rc = -ENOSYS; + +nvdimm->err = -rc; + +return rc; +} + #ifdef CONFIG_X86 int __init pmem_dom0_setup_permission(struct domain *d) diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index a6882d1c9d..33c8fca081 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -28,6 +28,7 @@ #include #include #include +#include long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) { @@ -503,6 +504,14 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) break; } +#ifdef CONFIG_NVDIMM_PMEM +case XEN_SYSCTL_nvdimm_op: +ret = pmem_do_sysctl(>u.nvdimm); +if ( ret != -ENOSYS ) +copyback = 1; +break; +#endif + default: ret = arch_do_sysctl(op, u_sysctl); copyback = 0; diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index 7830b987da..e8272ae968 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -36,7 +36,7 @@ #include "physdev.h" #include "tmem.h" -#define XEN_SYSCTL_INTERFACE_VERSION 0x000F +#define XEN_SYSCTL_INTERFACE_VERSION 0x0010 /* * Read console content from Xen buffer ring. @@ -1114,6 +1114,21 @@ struct xen_sysctl_set_parameter { typedef struct xen_sysctl_set_parameter xen_sysctl_set_parameter_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_set_parameter_t); +/* + * Interface for NVDIMM management. + */ + +struct xen_sysctl_nvdimm_op { +uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */ +uint32_t pad; /* IN: Always zero. */ +union { +/* Parameters of XEN_SYSCTL_nvdimm_* will be added here. */ +} u; +uint32_t err; /* OUT: error code */ +}; +typedef struct xen_sysctl_nvdimm_op xen_sysctl_nvdimm_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_op_t); + struct xen_sysctl { uint32_t cmd; #define XEN_SYSCTL_readconsole1 @@ -1143,6 +1158,7 @@ struct xen_sysctl { #define XEN_SYSCTL_get_cpu_featureset26 #define XEN_SYSCTL_livepatch_op 27 #define XEN_SYSCTL_set_parameter 28 +#define XEN_SYSCTL_nvdimm_op 29 uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */ union { struct xen_sysctl_readconsole readconsole; @@ -1172,6 +1188,7 @@ struct xen_sysctl { struct xen_sysctl_cpu_featuresetcpu_featureset; struct xen_sysctl_livepatch_op livepatch; struct xen_sysctl_set_parameter set_parameter; +struct xen_sysctl_nvdimm_op nvdimm; uint8_t pad[128]; } u; }; diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h index d5bd54ff19..922b12f570 100644 --- a/xen/include/xen/pmem.h +++ b/xen/include/xen/pmem.h @@ -20,9 +20,11 @@ #define __XEN_PMEM_H__ #ifdef CONFIG_NVDIMM_PMEM +#include #include int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm); +int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm); #ifdef CONFIG_X86 diff --git a/xen/xsm/flask/hooks.c b/xen/xsm
[Xen-devel] [RFC XEN PATCH v3 14/39] x86_64/mm: refactor memory_add()
Separate the revertible part of memory_add_common(), which will also be used in PMEM management. The separation will ease the failure recovery in PMEM management. Several coding-style issues in the touched code are fixed as well. No functional change is introduced. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/x86_64/mm.c | 98 +++- 1 file changed, 56 insertions(+), 42 deletions(-) diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c index f635e4bf70..c8ffafe8a8 100644 --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -1337,21 +1337,16 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn) return 1; } -/* - * A bit paranoid for memory allocation failure issue since - * it may be reason for memory add - */ -int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm) +static int memory_add_common(struct mem_hotadd_info *info, + unsigned int pxm, bool direct_map) { -struct mem_hotadd_info info; +unsigned long spfn = info->spfn, epfn = info->epfn; int ret; nodeid_t node; unsigned long old_max = max_page, old_total = total_pages; unsigned long old_node_start, old_node_span, orig_online; unsigned long i; -dprintk(XENLOG_INFO, "memory_add %lx ~ %lx with pxm %x\n", spfn, epfn, pxm); - if ( !mem_hotadd_check(spfn, epfn) ) return -EINVAL; @@ -1366,22 +1361,25 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm) return -EINVAL; } -i = virt_to_mfn(HYPERVISOR_VIRT_END - 1) + 1; -if ( spfn < i ) -{ -ret = map_pages_to_xen((unsigned long)mfn_to_virt(spfn), spfn, - min(epfn, i) - spfn, PAGE_HYPERVISOR); -if ( ret ) -goto destroy_directmap; -} -if ( i < epfn ) +if ( direct_map ) { -if ( i < spfn ) -i = spfn; -ret = map_pages_to_xen((unsigned long)mfn_to_virt(i), i, - epfn - i, __PAGE_HYPERVISOR_RW); -if ( ret ) -goto destroy_directmap; +i = virt_to_mfn(HYPERVISOR_VIRT_END - 1) + 1; +if ( spfn < i ) +{ +ret = map_pages_to_xen((unsigned long)mfn_to_virt(spfn), spfn, + min(epfn, i) - spfn, PAGE_HYPERVISOR); +if ( ret ) +goto destroy_directmap; +} +if ( i < epfn ) +{ +if ( i < spfn ) +i = spfn; +ret = map_pages_to_xen((unsigned long)mfn_to_virt(i), i, + epfn - i, __PAGE_HYPERVISOR_RW); +if ( ret ) +goto destroy_directmap; +} } old_node_start = node_start_pfn(node); @@ -1398,22 +1396,18 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm) } else { -if (node_start_pfn(node) > spfn) +if ( node_start_pfn(node) > spfn ) NODE_DATA(node)->node_start_pfn = spfn; -if (node_end_pfn(node) < epfn) +if ( node_end_pfn(node) < epfn ) NODE_DATA(node)->node_spanned_pages = epfn - node_start_pfn(node); } -info.spfn = spfn; -info.epfn = epfn; -info.cur = spfn; - -ret = extend_frame_table(); +ret = extend_frame_table(info); if ( ret ) goto restore_node_status; /* Set max_page as setup_m2p_table will use it*/ -if (max_page < epfn) +if ( max_page < epfn ) { max_page = epfn; max_pdx = pfn_to_pdx(max_page - 1) + 1; @@ -1421,7 +1415,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm) total_pages += epfn - spfn; set_pdx_range(spfn, epfn); -ret = setup_m2p_table(); +ret = setup_m2p_table(info); if ( ret ) goto destroy_m2p; @@ -1429,11 +1423,12 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm) if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) ) { for ( i = spfn; i < epfn; i++ ) -if ( iommu_map_page(hardware_domain, i, i, IOMMUF_readable|IOMMUF_writable) ) +if ( iommu_map_page(hardware_domain, i, i, +IOMMUF_readable|IOMMUF_writable) ) break; if ( i != epfn ) { -while (i-- > old_max) +while ( i-- > old_max ) /* If statement to satisfy __must_check. */ if ( iommu_unmap_page(hardware_domain, i) ) continue; @@ -1442,14 +1437,10 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm) } } -/* We c
[Xen-devel] [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT
A PMEM region with failures (e.g., not properly flushed in the last power cycle, or some blocks within it are borken) cannot be safely used by Xen and guest. Scan the state flags of NVDIMM region mapping structures in NFIT to check whether any failures happened to a PMEM region. The recovery of those failure are left out of Xen (e.g. left to the firmware or other management utilities on the bare metal). Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/acpi/boot.c | 4 ++ xen/drivers/acpi/nfit.c | 153 +- xen/include/acpi/actbl1.h | 26 xen/include/xen/acpi.h| 1 + 4 files changed, 183 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/acpi/boot.c b/xen/arch/x86/acpi/boot.c index 8e6c96dcf6..f52a2c6dc5 100644 --- a/xen/arch/x86/acpi/boot.c +++ b/xen/arch/x86/acpi/boot.c @@ -732,5 +732,9 @@ int __init acpi_boot_init(void) acpi_table_parse(ACPI_SIG_BGRT, acpi_invalidate_bgrt); +#ifdef CONFIG_NVDIMM_PMEM + acpi_nfit_init(); +#endif + return 0; } diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c index e099378ee0..b88a587b8d 100644 --- a/xen/drivers/acpi/nfit.c +++ b/xen/drivers/acpi/nfit.c @@ -31,11 +31,143 @@ static const uint8_t nfit_spa_pmem_guid[] = 0xac, 0x43, 0x0d, 0x33, 0x18, 0xb7, 0x8c, 0xdb, }; +struct nfit_spa_desc { +struct list_head link; +struct acpi_nfit_system_address *acpi_table; +}; + +struct nfit_memdev_desc { +struct list_head link; +struct acpi_nfit_memory_map *acpi_table; +struct nfit_spa_desc *spa_desc; +}; + struct acpi_nfit_desc { struct acpi_table_nfit *acpi_table; +struct list_head spa_list; +struct list_head memdev_list; }; -static struct acpi_nfit_desc nfit_desc; +static struct acpi_nfit_desc nfit_desc = { +.spa_list = LIST_HEAD_INIT(nfit_desc.spa_list), +.memdev_list = LIST_HEAD_INIT(nfit_desc.memdev_list), +}; + +static void __init acpi_nfit_del_subtables(struct acpi_nfit_desc *desc) +{ +struct nfit_spa_desc *spa, *spa_next; +struct nfit_memdev_desc *memdev, *memdev_next; + +list_for_each_entry_safe(spa, spa_next, >spa_list, link) +{ +list_del(>link); +xfree(spa); +} +list_for_each_entry_safe (memdev, memdev_next, >memdev_list, link) +{ +list_del(>link); +xfree(memdev); +} +} + +static int __init acpi_nfit_add_subtables(struct acpi_nfit_desc *desc) +{ +struct acpi_table_nfit *nfit_table = desc->acpi_table; +uint32_t hdr_offset = sizeof(*nfit_table); +uint32_t nfit_length = nfit_table->header.length; +struct acpi_nfit_header *hdr; +struct nfit_spa_desc *spa_desc; +struct nfit_memdev_desc *memdev_desc; +int ret = 0; + +#define INIT_DESC(desc, acpi_hdr, acpi_type, desc_list) \ +do {\ +(desc) = xzalloc(typeof(*(desc))); \ +if ( unlikely(!(desc)) ) { \ +ret = -ENOMEM; \ +goto nomem; \ +} \ +(desc)->acpi_table = (acpi_type *)(acpi_hdr); \ +INIT_LIST_HEAD(&(desc)->link); \ +list_add_tail(&(desc)->link, (desc_list)); \ +} while ( 0 ) + +while ( hdr_offset < nfit_length ) +{ +hdr = (void *)nfit_table + hdr_offset; +hdr_offset += hdr->length; + +switch ( hdr->type ) +{ +case ACPI_NFIT_TYPE_SYSTEM_ADDRESS: +INIT_DESC(spa_desc, hdr, struct acpi_nfit_system_address, + >spa_list); +break; + +case ACPI_NFIT_TYPE_MEMORY_MAP: +INIT_DESC(memdev_desc, hdr, struct acpi_nfit_memory_map, + >memdev_list); +break; + +default: +continue; +} +} + +#undef INIT_DESC + +return 0; + + nomem: +acpi_nfit_del_subtables(desc); + +return ret; +} + +static void __init acpi_nfit_link_subtables(struct acpi_nfit_desc *desc) +{ +struct nfit_spa_desc *spa_desc; +struct nfit_memdev_desc *memdev_desc; +uint16_t spa_idx; + +list_for_each_entry(memdev_desc, >memdev_list, link) +{ +spa_idx = memdev_desc->acpi_table->range_index; +list_for_each_entry(spa_desc, >spa_list, link) +{ +if ( spa_desc->acpi_table->range_index == spa_idx ) +break; +} +memdev_desc->spa_desc = spa_desc; +} +} + +static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc) +{ +struct nfit_spa_desc *spa_desc; +struct nfit_memdev_desc *memdev_desc; +struct acpi_nfit_system_address *spa; +unsigned long smfn, e
[Xen-devel] [RFC XEN PATCH v3 13/39] tools/xen-ndctl: add command 'list'
Two options are supported by command 'list'. '--raw' indicates to list all PMEM regions detected by Xen hypervisor, which can be later configured for future usages. '--all' indicates all other options (i.e. --raw and future options). Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/misc/xen-ndctl.c | 75 ++ 1 file changed, 75 insertions(+) diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c index de40e29ff6..6277a1eda2 100644 --- a/tools/misc/xen-ndctl.c +++ b/tools/misc/xen-ndctl.c @@ -27,12 +27,14 @@ #include #include +#include #include #include static xc_interface *xch; static int handle_help(int argc, char *argv[]); +static int handle_list(int argc, char *argv[]); static int handle_list_cmds(int argc, char *argv[]); static const struct xen_ndctl_cmd @@ -52,6 +54,15 @@ static const struct xen_ndctl_cmd .handler = handle_help, }, +{ +.name= "list", +.syntax = "[--all | --raw ]", +.help= "--all: the default option, list all PMEM regions of following types.\n" + "--raw: list all PMEM regions detected by Xen hypervisor.\n", +.handler = handle_list, +.need_xc = true, +}, + { .name= "list-cmds", .syntax = "", @@ -109,6 +120,70 @@ static int handle_help(int argc, char *argv[]) return 0; } +static int handle_list_raw(void) +{ +int rc; +unsigned int nr = 0, i; +xen_sysctl_nvdimm_pmem_raw_region_t *raw_list; + +rc = xc_nvdimm_pmem_get_regions_nr(xch, PMEM_REGION_TYPE_RAW, ); +if ( rc ) +{ +fprintf(stderr, "Cannot get the number of PMEM regions: %s.\n", +strerror(-rc)); +return rc; +} + +raw_list = malloc(nr * sizeof(*raw_list)); +if ( !raw_list ) +return -ENOMEM; + +rc = xc_nvdimm_pmem_get_regions(xch, PMEM_REGION_TYPE_RAW, raw_list, ); +if ( rc ) +goto out; + +printf("Raw PMEM regions:\n"); +for ( i = 0; i < nr; i++ ) +printf(" %u: MFN 0x%lx - 0x%lx, PXM %u\n", + i, raw_list[i].smfn, raw_list[i].emfn, raw_list[i].pxm); + + out: +free(raw_list); + +return rc; +} + +static const struct list_handlers { +const char *option; +int (*handler)(void); +} list_hndrs[] = +{ +{ "--raw", handle_list_raw }, +}; + +static const unsigned int nr_list_hndrs = +sizeof(list_hndrs) / sizeof(list_hndrs[0]); + +static int handle_list(int argc, char *argv[]) +{ +bool list_all = argc <= 1 || !strcmp(argv[1], "--all"); +unsigned int i; +bool handled = false; +int rc = 0; + +for ( i = 0; i < nr_list_hndrs && !rc; i++) +if ( list_all || !strcmp(argv[1], list_hndrs[i].option) ) +{ +rc = list_hndrs[i].handler(); +handled = true; +} + +if ( !handled ) +return handle_unrecognized_argument(argv[0], argv[1]); + +return rc; +} + static int handle_list_cmds(int argc, char *argv[]) { unsigned int i; -- 2.14.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Graphical virtualization in intel® Atom is possible?
+Hongbo Wang from Intel GPU virtualization team On 08/17/17 06:36 +, Asharaf Perinchikkal wrote: > Hi All, > > We are trying to do graphical virtualization in intel® Atom™ > E3845(MinnowBoard Turbot Quad-Core board) using xen. > > Is it possible to do graphical virtualization in intel® Atom? > > If yes,Could you please suggest what are versions of xen and linux > recommended to use and steps i need to follow? > > Regards > Asharaf P > ---Disclaimer-- This e-mail contains PRIVILEGED > AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). > If you are not the intended recipient, please notify the sender by e-mail and > delete the original message. Opinions, conclusions and other information in > this transmission that do not relate to the official business of QuEST Global > and/or its subsidiaries, shall be understood as neither given nor endorsed by > it. Any statements made herein that are tantamount to contractual > obligations, promises, claims or commitments shall not be binding on the > Company unless followed by written confirmation by an authorized signatory of > the Company. > --- > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Building XenGT for Intel embedded board
+Hongbo Wang from Intel GPU virtualization team On 08/10/17 22:47 +, Monisha Barooah wrote: > Hi Everyone, > I am currently exploring on bringing up XenGT for an Intel embedded board. > > I came across this document relating to bringing up XenGT for the Sandy > Bridge/Ivy Bridge/Haswell platform > https://www.intel.com/content/dam/www/public/us/en/documents/guides/xgengt-for-ivi-solutions-dev-kit-getting-started-guide.pdf > > Our current Intel embedded board is up with an Yocto image integrated with > the Intel BSP for the board. The board uses ABL boot loader. > > I saw in the XenGT document for the Sandy Bridge/Ivy Bridge/Haswell platform, > that there is mention of Qemu alone and no mention of any Intel BSPs. Don't > we require Intel BSP for dom0 kernel to work in the XenGT hypervisor? Or is a > generic version of Intel BSP integrated with the kernel image link > https://github.com/01org/XenGT-Preview-kernel.git. > > Also, as we have an Yocto image in the Intel board, we might have to cross > compile the Kernel, Xen and Qemu builds as mentioned in the link above for > our Intel embedded board using a Linaro toolchain. If not, is there a way, we > can link this particular version of XenGT directly with our Yocto image for > the Intel board by including the meta-virtualization layer as mentioned in > the link http://git.yoctoproject.org/cgit/cgit.cgi/meta-virtualization/about/ > and doing 'bitbake xen image minimal'? > > Please advise which is the correct route to take in this regard. > > Thanks > M > > > > __ > CONFIDENTIALITY NOTE: This electronic message (including any attachments) may > contain information that is privileged, confidential, and proprietary. If you > are not the intended recipient, you are hereby notified that any disclosure, > copying, distribution, or use of the information contained herein (including > any reliance thereon) is strictly prohibited. If you received this electronic > message in error, please immediately reply to the sender that you have > received this communication and destroy the material in its entirety, whether > in electronic or hard copy format. Although Rivian Automotive Inc. has taken > reasonable precautions to ensure no viruses are present in this email, Rivian > accepts no responsibility for any loss or damage arising from the use of this > email or attachments. > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Is possible to do GPU virtualization in Intel® Atom?
+Hongbo from Intel GPU virtualization team On 08/02/17 09:41 +, Asharaf Perinchikkal wrote: > Is possible to achieve GPU virtualization in Intel® Atom using para > virtualization? > > From: Roger Pau Monné [roger@citrix.com] > Sent: Wednesday, August 02, 2017 1:04 PM > To: Asharaf Perinchikkal > Cc: xen-devel@lists.xen.org; Anoop Babu > Subject: Re: [Xen-devel] Is possible to do GPU virtualization in Intel® Atom? > > On Tue, Aug 01, 2017 at 10:01:01AM +, Asharaf Perinchikkal wrote: > > Hi All, > > > > > > In Intel® Atom™ E3845(MinnowBoard Turbot Quad-Core board) has only support > > for Virtualization Technology (VT-x). > > > > No support for Intel® Virtualization Technology for Directed I/O (VT-d). > > [https://ark.intel.com/products/78475/Intel-Atom-Processor-E3845-2M-Cache-1_91-GHz] > > Without VT-d (IOMMU) you won't be able to passthrough any physical > device to a guest, so no, you won't be able to do GPU passthrough (at > least in a safe way). > > Roger. > ---Disclaimer-- This e-mail contains PRIVILEGED > AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). > If you are not the intended recipient, please notify the sender by e-mail and > delete the original message. Opinions, conclusions and other information in > this transmission that do not relate to the official business of QuEST Global > and/or its subsidiaries, shall be understood as neither given nor endorsed by > it. Any statements made herein that are tantamount to contractual > obligations, promises, claims or commitments shall not be binding on the > Company unless followed by written confirmation by an authorized signatory of > the Company. > --- > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 6/7] tools/libxc: add support of injecting MC# to specified CPUs
On 07/12/17 09:25 -0400, Konrad Rzeszutek Wilk wrote: > On Wed, Jul 12, 2017 at 10:04:39AM +0800, Haozhong Zhang wrote: > > Though XEN_MC_inject_v2 allows injecting MC# to specified CPUs, the > > current xc_mca_op() does not use this feature and not provide an > > interface to callers. This commit add a new xc_mca_op_inject_v2() that > > receives a cpumap providing the set of target CPUs. > > > > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> > > Acked-by: Wei Liu <wei.l...@citrix.com> > > --- > > Cc: Ian Jackson <ian.jack...@eu.citrix.com> > > Cc: Wei Liu <wei.l...@citrix.com> > > --- > > tools/libxc/include/xenctrl.h | 2 ++ > > tools/libxc/xc_misc.c | 52 > > ++- > > 2 files changed, 53 insertions(+), 1 deletion(-) > > > > diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h > > index c51bb3b448..552a4fd47d 100644 > > --- a/tools/libxc/include/xenctrl.h > > +++ b/tools/libxc/include/xenctrl.h > > @@ -1809,6 +1809,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, > > void xc_cpuid_to_str(const unsigned int *regs, > > char **strs); /* some strs[] may be NULL if ENOMEM */ > > int xc_mca_op(xc_interface *xch, struct xen_mc *mc); > > +int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags, > > +xc_cpumap_t cpumap, unsigned int nr_cpus); > > #endif > > > > struct xc_px_val { > > diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c > > index 88084fde30..2303293c6c 100644 > > --- a/tools/libxc/xc_misc.c > > +++ b/tools/libxc/xc_misc.c > > @@ -341,7 +341,57 @@ int xc_mca_op(xc_interface *xch, struct xen_mc *mc) > > xc_hypercall_bounce_post(xch, mc); > > return ret; > > } > > -#endif > > + > > +int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags, > > +xc_cpumap_t cpumap, unsigned int nr_bits) > > +{ > > +int ret = -1; > > +struct xen_mc mc_buf, *mc = _buf; > > +struct xen_mc_inject_v2 *inject = >u.mc_inject_v2; > > + > > +DECLARE_HYPERCALL_BOUNCE(cpumap, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN); > > +DECLARE_HYPERCALL_BOUNCE(mc, sizeof(*mc), > > XC_HYPERCALL_BUFFER_BOUNCE_BOTH); > > + > > +memset(mc, 0, sizeof(*mc)); > > + > > +if ( cpumap ) > > +{ > > +if ( !nr_bits ) > > +{ > > +errno = EINVAL; > > +goto out; > > +} > > + > > +HYPERCALL_BOUNCE_SET_SIZE(cpumap, (nr_bits + 7) / 8); > > bitmap_size ? nr_bits is of type unsigned int, while bitmap_size() requires a signed int argument, though the number of CPUs passed via nr_bits in practice can be represented by a signed int. Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 7/7] tools/xen-mceinj: add support of injecting LMCE
On 07/12/17 09:26 -0400, Konrad Rzeszutek Wilk wrote: > On Wed, Jul 12, 2017 at 10:04:40AM +0800, Haozhong Zhang wrote: > > If option '-l' or '--lmce' is specified and the host supports LMCE, > > xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c' > > is not present). > > > > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> > > Acked-by: Wei Liu <wei.l...@citrix.com> > > --- > > Cc: Ian Jackson <ian.jack...@eu.citrix.com> > > Cc: Wei Liu <wei.l...@citrix.com> > > --- > > tools/tests/mce-test/tools/xen-mceinj.c | 50 > > +++-- > > 1 file changed, 48 insertions(+), 2 deletions(-) > > > > diff --git a/tools/tests/mce-test/tools/xen-mceinj.c > > b/tools/tests/mce-test/tools/xen-mceinj.c > > index bae5a46eb5..380e42190c 100644 > > --- a/tools/tests/mce-test/tools/xen-mceinj.c > > +++ b/tools/tests/mce-test/tools/xen-mceinj.c [..] > > > > +static int inject_lmce(xc_interface *xc_handle, unsigned int cpu) > > +{ > > +uint8_t *cpumap = NULL; > > +size_t cpumap_size, line, shift; > > +unsigned int nr_cpus; > > +int ret; > > + > > +nr_cpus = mca_cpuinfo(xc_handle); > > +if ( !nr_cpus ) > > +err(xc_handle, "Failed to get mca_cpuinfo"); > > +if ( cpu >= nr_cpus ) > > +err(xc_handle, "-c %u is larger than %u", cpu, nr_cpus - 1); > > + > > +cpumap_size = (nr_cpus + 7) / 8; > > bitmap_size > IIUC, these bitmap_* functions/macros are libxc internals and should not be used here. Haozhong ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 7/7] tools/xen-mceinj: add support of injecting LMCE
If option '-l' or '--lmce' is specified and the host supports LMCE, xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c' is not present). Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Acked-by: Wei Liu <wei.l...@citrix.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/tests/mce-test/tools/xen-mceinj.c | 50 +++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/tools/tests/mce-test/tools/xen-mceinj.c b/tools/tests/mce-test/tools/xen-mceinj.c index bae5a46eb5..380e42190c 100644 --- a/tools/tests/mce-test/tools/xen-mceinj.c +++ b/tools/tests/mce-test/tools/xen-mceinj.c @@ -56,6 +56,8 @@ #define MSR_IA32_MC0_MISC0x0403 #define MSR_IA32_MC0_CTL20x0280 +#define MCG_STATUS_LMCE 0x8 + struct mce_info { const char *description; uint8_t mcg_stat; @@ -113,6 +115,7 @@ static struct mce_info mce_table[] = { #define LOGFILE stdout int dump; +int lmce; struct xen_mc_msrinject msr_inj; static void Lprintf(const char *fmt, ...) @@ -212,6 +215,35 @@ static int inject_mce(xc_interface *xc_handle, int cpu_nr) return xc_mca_op(xc_handle, ); } +static int inject_lmce(xc_interface *xc_handle, unsigned int cpu) +{ +uint8_t *cpumap = NULL; +size_t cpumap_size, line, shift; +unsigned int nr_cpus; +int ret; + +nr_cpus = mca_cpuinfo(xc_handle); +if ( !nr_cpus ) +err(xc_handle, "Failed to get mca_cpuinfo"); +if ( cpu >= nr_cpus ) +err(xc_handle, "-c %u is larger than %u", cpu, nr_cpus - 1); + +cpumap_size = (nr_cpus + 7) / 8; +cpumap = malloc(cpumap_size); +if ( !cpumap ) +err(xc_handle, "Failed to allocate cpumap\n"); +memset(cpumap, 0, cpumap_size); +line = cpu / 8; +shift = cpu % 8; +memset(cpumap + line, 1 << shift, 1); + +ret = xc_mca_op_inject_v2(xc_handle, XEN_MC_INJECT_TYPE_LMCE, + cpumap, cpumap_size * 8); + +free(cpumap); +return ret; +} + static uint64_t bank_addr(int bank, int type) { uint64_t addr; @@ -330,8 +362,15 @@ static int inject(xc_interface *xc_handle, struct mce_info *mce, uint32_t cpu_nr, uint32_t domain, uint64_t gaddr) { int ret = 0; +uint8_t mcg_status = mce->mcg_stat; -ret = inject_mcg_status(xc_handle, cpu_nr, mce->mcg_stat, domain); +if ( lmce ) +{ +if ( mce->cmci ) +err(xc_handle, "No support to inject CMCI as LMCE"); +mcg_status |= MCG_STATUS_LMCE; +} +ret = inject_mcg_status(xc_handle, cpu_nr, mcg_status, domain); if ( ret ) err(xc_handle, "Failed to inject MCG_STATUS MSR"); @@ -354,6 +393,8 @@ static int inject(xc_interface *xc_handle, struct mce_info *mce, err(xc_handle, "Failed to inject MSR"); if ( mce->cmci ) ret = inject_cmci(xc_handle, cpu_nr); +else if ( lmce ) +ret = inject_lmce(xc_handle, cpu_nr); else ret = inject_mce(xc_handle, cpu_nr); if ( ret ) @@ -393,6 +434,7 @@ static struct option opts[] = { {"dump", 0, 0, 'D'}, {"help", 0, 0, 'h'}, {"page", 0, 0, 'p'}, +{"lmce", 0, 0, 'l'}, {"", 0, 0, '\0'} }; @@ -409,6 +451,7 @@ static void help(void) " -d, --domain=DOMID target domain, the default is Xen itself\n" " -h, --help print this page\n" " -p, --page=ADDR physical address to report\n" + " -l, --lmce inject as LMCE (Intel only)\n" " -t, --type=ERROR error type\n"); for ( i = 0; i < MCE_TABLE_SIZE; i++ ) @@ -438,7 +481,7 @@ int main(int argc, char *argv[]) } while ( 1 ) { -c = getopt_long(argc, argv, "c:Dd:t:hp:", opts, _index); +c = getopt_long(argc, argv, "c:Dd:t:hp:l", opts, _index); if ( c == -1 ) break; switch ( c ) { @@ -463,6 +506,9 @@ int main(int argc, char *argv[]) case 't': type = strtol(optarg, NULL, 0); break; +case 'l': +lmce = 1; +break; case 'h': default: help(); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 5/7] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/cpu/mcheck/mce.c | 24 +++- xen/include/public/arch-x86/xen-mca.h | 1 + 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c index ee04fb54ff..30525dd78b 100644 --- a/xen/arch/x86/cpu/mcheck/mce.c +++ b/xen/arch/x86/cpu/mcheck/mce.c @@ -1485,11 +1485,12 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc) { const cpumask_t *cpumap; cpumask_var_t cmv; +bool broadcast = op->u.mc_inject_v2.flags & XEN_MC_INJECT_CPU_BROADCAST; if (nr_mce_banks == 0) return x86_mcerr("do_mca #MC", -ENODEV); -if ( op->u.mc_inject_v2.flags & XEN_MC_INJECT_CPU_BROADCAST ) +if ( broadcast ) cpumap = _online_map; else { @@ -1529,6 +1530,27 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc) } break; +case XEN_MC_INJECT_TYPE_LMCE: +if ( !lmce_support ) +{ +ret = x86_mcerr("No LMCE support", -EINVAL); +break; +} +if ( broadcast ) +{ +ret = x86_mcerr("Broadcast cannot be used with LMCE", -EINVAL); +break; +} +/* Ensure at most one CPU is specified. */ +if ( nr_cpu_ids > cpumask_next(cpumask_first(cpumap), cpumap) ) +{ +ret = x86_mcerr("More than one CPU specified for LMCE", +-EINVAL); +break; +} +on_selected_cpus(cpumap, x86_mc_mceinject, NULL, 1); +break; + default: ret = x86_mcerr("Wrong mca type\n", -EINVAL); break; diff --git a/xen/include/public/arch-x86/xen-mca.h b/xen/include/public/arch-x86/xen-mca.h index 7db990723b..dc35267249 100644 --- a/xen/include/public/arch-x86/xen-mca.h +++ b/xen/include/public/arch-x86/xen-mca.h @@ -414,6 +414,7 @@ struct xen_mc_mceinject { #define XEN_MC_INJECT_TYPE_MASK 0x7 #define XEN_MC_INJECT_TYPE_MCE 0x0 #define XEN_MC_INJECT_TYPE_CMCI 0x1 +#define XEN_MC_INJECT_TYPE_LMCE 0x2 #define XEN_MC_INJECT_CPU_BROADCAST 0x8 -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 2/7] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, then allow guest to read/write MSR_IA32_MCG_EXT_CTL. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/cpu/mcheck/vmce.c | 34 +- xen/arch/x86/domctl.c | 2 ++ xen/include/asm-x86/mce.h | 1 + xen/include/public/arch-x86/hvm/save.h | 1 + 4 files changed, 37 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c index 1356f611ab..060e2d0582 100644 --- a/xen/arch/x86/cpu/mcheck/vmce.c +++ b/xen/arch/x86/cpu/mcheck/vmce.c @@ -91,6 +91,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct hvm_vmce_vcpu *ctxt) v->arch.vmce.mcg_cap = ctxt->caps; v->arch.vmce.bank[0].mci_ctl2 = ctxt->mci_ctl2_bank0; v->arch.vmce.bank[1].mci_ctl2 = ctxt->mci_ctl2_bank1; +v->arch.vmce.mcg_ext_ctl = ctxt->mcg_ext_ctl; return 0; } @@ -200,6 +201,26 @@ int vmce_rdmsr(uint32_t msr, uint64_t *val) mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_CTL %#"PRIx64"\n", cur, *val); break; +case MSR_IA32_MCG_EXT_CTL: +/* + * If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, the LMCE and LOCK + * bits are always set in guest MSR_IA32_FEATURE_CONTROL by Xen, so it + * does not need to check them here. + */ +if ( cur->arch.vmce.mcg_cap & MCG_LMCE_P ) +{ +*val = cur->arch.vmce.mcg_ext_ctl; +mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL %#"PRIx64"\n", + cur, *val); +} +else +{ +ret = -1; +mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL, not supported\n", + cur); +} +break; + default: ret = mce_bank_msr(cur, msr) ? bank_mce_rdmsr(cur, msr, val) : 0; break; @@ -309,6 +330,16 @@ int vmce_wrmsr(uint32_t msr, uint64_t val) mce_printk(MCE_VERBOSE, "MCE: %pv: MCG_CAP is r/o\n", cur); break; +case MSR_IA32_MCG_EXT_CTL: +if ( (cur->arch.vmce.mcg_cap & MCG_LMCE_P) && + !(val & ~MCG_EXT_CTL_LMCE_EN) ) +cur->arch.vmce.mcg_ext_ctl = val; +else +ret = -1; +mce_printk(MCE_VERBOSE, "MCE: %pv: wr MCG_EXT_CTL %"PRIx64"%s\n", + cur, val, (ret == -1) ? ", not supported" : ""); +break; + default: ret = mce_bank_msr(cur, msr) ? bank_mce_wrmsr(cur, msr, val) : 0; break; @@ -327,7 +358,8 @@ static int vmce_save_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h) struct hvm_vmce_vcpu ctxt = { .caps = v->arch.vmce.mcg_cap, .mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2, -.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2 +.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2, +.mcg_ext_ctl = v->arch.vmce.mcg_ext_ctl, }; err = hvm_save_entry(VMCE_VCPU, v->vcpu_id, h, ); diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 3637d32669..3628af2f70 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -315,6 +315,7 @@ static int vcpu_set_vmce(struct vcpu *v, static const unsigned int valid_sizes[] = { sizeof(evc->vmce), +VMCE_SIZE(mci_ctl2_bank1), VMCE_SIZE(caps), }; #undef VMCE_SIZE @@ -908,6 +909,7 @@ long arch_do_domctl( evc->vmce.caps = v->arch.vmce.mcg_cap; evc->vmce.mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2; evc->vmce.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2; +evc->vmce.mcg_ext_ctl = v->arch.vmce.mcg_ext_ctl; ret = 0; vcpu_unpause(v); diff --git a/xen/include/asm-x86/mce.h b/xen/include/asm-x86/mce.h index 56ad1f92dd..35f9962638 100644 --- a/xen/include/asm-x86/mce.h +++ b/xen/include/asm-x86/mce.h @@ -27,6 +27,7 @@ struct vmce_bank { struct vmce { uint64_t mcg_cap; uint64_t mcg_status; +uint64_t mcg_ext_ctl; spinlock_t lock; struct vmce_bank bank[GUEST_MC_BANK_NUM]; }; diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h index 816973b9c2..fd7bf3fb38 100644 --- a/xen/include/public/arch-x86/hvm/save.h +++ b/xen/include/public/arch-x86/hvm/save.h @@ -610,6 +610,7 @@ struct hvm_vmce_vcpu { uint64_t caps; uint64_t mci_ctl2_bank0; uint64_t mci_ctl2_bank1; +uint64_t mcg_ext_ctl; }; DECLARE_HVM_SAVE_TYPE(VMCE_VCPU, 18, struct hvm_vmce_vcpu); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 6/7] tools/libxc: add support of injecting MC# to specified CPUs
Though XEN_MC_inject_v2 allows injecting MC# to specified CPUs, the current xc_mca_op() does not use this feature and not provide an interface to callers. This commit add a new xc_mca_op_inject_v2() that receives a cpumap providing the set of target CPUs. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Acked-by: Wei Liu <wei.l...@citrix.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/libxc/include/xenctrl.h | 2 ++ tools/libxc/xc_misc.c | 52 ++- 2 files changed, 53 insertions(+), 1 deletion(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index c51bb3b448..552a4fd47d 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1809,6 +1809,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, void xc_cpuid_to_str(const unsigned int *regs, char **strs); /* some strs[] may be NULL if ENOMEM */ int xc_mca_op(xc_interface *xch, struct xen_mc *mc); +int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags, +xc_cpumap_t cpumap, unsigned int nr_cpus); #endif struct xc_px_val { diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index 88084fde30..2303293c6c 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -341,7 +341,57 @@ int xc_mca_op(xc_interface *xch, struct xen_mc *mc) xc_hypercall_bounce_post(xch, mc); return ret; } -#endif + +int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags, +xc_cpumap_t cpumap, unsigned int nr_bits) +{ +int ret = -1; +struct xen_mc mc_buf, *mc = _buf; +struct xen_mc_inject_v2 *inject = >u.mc_inject_v2; + +DECLARE_HYPERCALL_BOUNCE(cpumap, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN); +DECLARE_HYPERCALL_BOUNCE(mc, sizeof(*mc), XC_HYPERCALL_BUFFER_BOUNCE_BOTH); + +memset(mc, 0, sizeof(*mc)); + +if ( cpumap ) +{ +if ( !nr_bits ) +{ +errno = EINVAL; +goto out; +} + +HYPERCALL_BOUNCE_SET_SIZE(cpumap, (nr_bits + 7) / 8); +if ( xc_hypercall_bounce_pre(xch, cpumap) ) +{ +PERROR("Could not bounce cpumap memory buffer"); +goto out; +} +set_xen_guest_handle(inject->cpumap.bitmap, cpumap); +inject->cpumap.nr_bits = nr_bits; +} + +inject->flags = flags; +mc->cmd = XEN_MC_inject_v2; +mc->interface_version = XEN_MCA_INTERFACE_VERSION; + +if ( xc_hypercall_bounce_pre(xch, mc) ) +{ +PERROR("Could not bounce xen_mc memory buffer"); +goto out_free_cpumap; +} + +ret = xencall1(xch->xcall, __HYPERVISOR_mca, HYPERCALL_BUFFER_AS_ARG(mc)); + +xc_hypercall_bounce_post(xch, mc); +out_free_cpumap: +if ( cpumap ) +xc_hypercall_bounce_post(xch, cpumap); +out: +return ret; +} +#endif /* __i386__ || __x86_64__ */ int xc_perfc_reset(xc_interface *xch) { -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 1/7] x86/domctl: generalize the restore of vMCE parameters
vMCE parameters in struct xen_domctl_ext_vcpucontext were extended in the past, and is likely to be extended in the future. When migrating a PV domain from old Xen, XEN_DOMCTL_set_ext_vcpucontext should handle the differences. Instead of adding ad-hoc handling code at each extension, we introduce an array to record sizes of the current and all past versions of vMCE parameters, and search for the largest one that does not expire the size of passed-in parameters to determine vMCE parameters that will be restored. If vMCE parameters are extended in the future, we only need to adapt the array to reflect the extension. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Changes in v9: * Rename "param" to "field" in macro VMCE_SIZE(). * Use min(..., sizeof(evc->vmce)) to get the size of vMCE parameters. --- xen/arch/x86/domctl.c | 55 +++ 1 file changed, 38 insertions(+), 17 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 7fa58b49af..3637d32669 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -302,6 +302,43 @@ static int update_domain_cpuid_info(struct domain *d, return 0; } +static int vcpu_set_vmce(struct vcpu *v, + const struct xen_domctl_ext_vcpucontext *evc) +{ +/* + * Sizes of vMCE parameters used by the current and past versions + * of Xen in descending order. If vMCE parameters are extended, + * remember to add the old size to this array by VMCE_SIZE(). + */ +#define VMCE_SIZE(field) \ +(offsetof(typeof(evc->vmce), field) + sizeof(evc->vmce.field)) + +static const unsigned int valid_sizes[] = { +sizeof(evc->vmce), +VMCE_SIZE(caps), +}; +#undef VMCE_SIZE + +struct hvm_vmce_vcpu vmce = { }; +unsigned int evc_vmce_size = +min(evc->size - offsetof(typeof(*evc), mcg_cap), sizeof(evc->vmce)); +unsigned int i = 0; + +BUILD_BUG_ON(offsetof(typeof(*evc), mcg_cap) != + offsetof(typeof(*evc), vmce.caps)); +BUILD_BUG_ON(sizeof(evc->mcg_cap) != sizeof(evc->vmce.caps)); + +while ( i < ARRAY_SIZE(valid_sizes) && evc_vmce_size < valid_sizes[i] ) +++i; + +if ( i == ARRAY_SIZE(valid_sizes) ) +return 0; + +memcpy(, >vmce, valid_sizes[i]); + +return vmce_restore_vcpu(v, ); +} + void arch_get_domain_info(const struct domain *d, struct xen_domctl_getdomaininfo *info) { @@ -912,23 +949,7 @@ long arch_do_domctl( else domain_pause(d); -BUILD_BUG_ON(offsetof(struct xen_domctl_ext_vcpucontext, - mcg_cap) != - offsetof(struct xen_domctl_ext_vcpucontext, - vmce.caps)); -BUILD_BUG_ON(sizeof(evc->mcg_cap) != sizeof(evc->vmce.caps)); -if ( evc->size >= offsetof(typeof(*evc), vmce) + - sizeof(evc->vmce) ) -ret = vmce_restore_vcpu(v, >vmce); -else if ( evc->size >= offsetof(typeof(*evc), mcg_cap) + - sizeof(evc->mcg_cap) ) -{ -struct hvm_vmce_vcpu vmce = { .caps = evc->mcg_cap }; - -ret = vmce_restore_vcpu(v, ); -} -else -ret = 0; +ret = vcpu_set_vmce(v, evc); domain_unpause(d); } -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 0/7] Add LMCE support
Changes in v9: * Minor updates in patch 1 per Jan's comments. * Collect Jan's R-b in patch 2. Haozhong Zhang (7): [M ] x86/domctl: generalize the restore of vMCE parameters [ R ] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL [ R ] x86/vmce: enable injecting LMCE to guest on Intel host [ RA] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP [ R ] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2 [ A] tools/libxc: add support of injecting MC# to specified CPUs [ A] tools/xen-mceinj: add support of injecting LMCE N: new in this version M: modified in this version R: got R-b A: got A-b docs/man/xl.cfg.pod.5.in| 24 + tools/libxc/include/xenctrl.h | 2 ++ tools/libxc/xc_misc.c | 52 ++- tools/libxc/xc_sr_save_x86_hvm.c| 1 + tools/libxl/libxl.h | 7 tools/libxl/libxl_dom.c | 15 tools/libxl/libxl_types.idl | 1 + tools/tests/mce-test/tools/xen-mceinj.c | 50 -- tools/xl/xl_parse.c | 31 ++-- xen/arch/x86/cpu/mcheck/mcaction.c | 23 xen/arch/x86/cpu/mcheck/mce.c | 24 - xen/arch/x86/cpu/mcheck/mce.h | 1 + xen/arch/x86/cpu/mcheck/mce_intel.c | 2 +- xen/arch/x86/cpu/mcheck/vmce.c | 64 +++-- xen/arch/x86/cpu/mcheck/vmce.h | 2 +- xen/arch/x86/domctl.c | 57 - xen/arch/x86/hvm/hvm.c | 5 +++ xen/include/asm-x86/mce.h | 2 ++ xen/include/public/arch-x86/hvm/save.h | 1 + xen/include/public/arch-x86/xen-mca.h | 1 + xen/include/public/hvm/params.h | 7 +++- 21 files changed, 336 insertions(+), 36 deletions(-) -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 4/7] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP. By default, LMCE is not exposed to guest so as to keep the backwards migration compatibility. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> for hypervisor side Acked-by: Wei Liu <wei.l...@citrix.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- docs/man/xl.cfg.pod.5.in| 24 tools/libxc/xc_sr_save_x86_hvm.c| 1 + tools/libxl/libxl.h | 7 +++ tools/libxl/libxl_dom.c | 15 +++ tools/libxl/libxl_types.idl | 1 + tools/xl/xl_parse.c | 31 +-- xen/arch/x86/cpu/mcheck/mce.h | 1 + xen/arch/x86/cpu/mcheck/mce_intel.c | 2 +- xen/arch/x86/cpu/mcheck/vmce.c | 19 ++- xen/arch/x86/hvm/hvm.c | 5 + xen/include/asm-x86/mce.h | 1 + xen/include/public/hvm/params.h | 7 ++- 12 files changed, 109 insertions(+), 5 deletions(-) diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in index ff3203550f..79cb2eaea7 100644 --- a/docs/man/xl.cfg.pod.5.in +++ b/docs/man/xl.cfg.pod.5.in @@ -2173,6 +2173,30 @@ natively or via hardware backwards compatibility support. =back +=head3 x86 + +=over 4 + +=item B
[Xen-devel] [PATCH v9 3/7] x86/vmce: enable injecting LMCE to guest on Intel host
Inject LMCE to guest if the host MCE is LMCE and the affected vcpu is known. Otherwise, broadcast MCE to all vcpus on Intel host. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/cpu/mcheck/mcaction.c | 23 --- xen/arch/x86/cpu/mcheck/vmce.c | 11 ++- xen/arch/x86/cpu/mcheck/vmce.h | 2 +- 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c index ca17d22bd8..f959bed2cb 100644 --- a/xen/arch/x86/cpu/mcheck/mcaction.c +++ b/xen/arch/x86/cpu/mcheck/mcaction.c @@ -44,6 +44,7 @@ mc_memerr_dhandler(struct mca_binfo *binfo, unsigned long mfn, gfn; uint32_t status; int vmce_vcpuid; +unsigned int mc_vcpuid; if (!mc_check_addr(bank->mc_status, bank->mc_misc, MC_ADDR_PHYSICAL)) { dprintk(XENLOG_WARNING, @@ -88,18 +89,26 @@ mc_memerr_dhandler(struct mca_binfo *binfo, goto vmce_failed; } -if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL || -global->mc_vcpuid == XEN_MC_VCPUID_INVALID) +mc_vcpuid = global->mc_vcpuid; +if (mc_vcpuid == XEN_MC_VCPUID_INVALID || +/* + * Because MC# may happen asynchronously with the actual + * operation that triggers the error, the domain ID as + * well as the vCPU ID collected in 'global' at MC# are + * not always precise. In that case, fallback to broadcast. + */ +global->mc_domid != bank->mc_domid || +(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && + (!(global->mc_gstatus & MCG_STATUS_LMCE) || + !(d->vcpu[mc_vcpuid]->arch.vmce.mcg_ext_ctl & +MCG_EXT_CTL_LMCE_EN vmce_vcpuid = VMCE_INJECT_BROADCAST; else -vmce_vcpuid = global->mc_vcpuid; +vmce_vcpuid = mc_vcpuid; bank->mc_addr = gfn << PAGE_SHIFT | (bank->mc_addr & (PAGE_SIZE -1 )); -/* TODO: support injecting LMCE */ -if (fill_vmsr_data(bank, d, - global->mc_gstatus & ~MCG_STATUS_LMCE, - vmce_vcpuid == VMCE_INJECT_BROADCAST)) +if (fill_vmsr_data(bank, d, global->mc_gstatus, vmce_vcpuid)) { mce_printk(MCE_QUIET, "Fill vMCE# data for DOM%d " "failed\n", bank->mc_domid); diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c index 060e2d0582..e2b3c5b8cc 100644 --- a/xen/arch/x86/cpu/mcheck/vmce.c +++ b/xen/arch/x86/cpu/mcheck/vmce.c @@ -465,14 +465,23 @@ static int vcpu_fill_mc_msrs(struct vcpu *v, uint64_t mcg_status, } int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d, - uint64_t gstatus, bool broadcast) + uint64_t gstatus, int vmce_vcpuid) { struct vcpu *v = d->vcpu[0]; +bool broadcast = (vmce_vcpuid == VMCE_INJECT_BROADCAST); int ret, err; if ( mc_bank->mc_domid == DOMID_INVALID ) return -EINVAL; +if ( broadcast ) +gstatus &= ~MCG_STATUS_LMCE; +else if ( gstatus & MCG_STATUS_LMCE ) +{ +ASSERT(vmce_vcpuid >= 0 && vmce_vcpuid < d->max_vcpus); +v = d->vcpu[vmce_vcpuid]; +} + /* * vMCE with the actual error information is injected to vCPU0, * and, if broadcast is required, we choose to inject less severe diff --git a/xen/arch/x86/cpu/mcheck/vmce.h b/xen/arch/x86/cpu/mcheck/vmce.h index 74f6381460..2797e00275 100644 --- a/xen/arch/x86/cpu/mcheck/vmce.h +++ b/xen/arch/x86/cpu/mcheck/vmce.h @@ -17,7 +17,7 @@ int vmce_amd_rdmsr(const struct vcpu *, uint32_t msr, uint64_t *val); int vmce_amd_wrmsr(struct vcpu *, uint32_t msr, uint64_t val); int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d, - uint64_t gstatus, bool broadcast); + uint64_t gstatus, int vmce_vcpuid); #define VMCE_INJECT_BROADCAST (-1) int inject_vmce(struct domain *d, int vcpu); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 0/7] Add LMCE support
Changes in v8: * Adjust the generalization of setting vMCE parameters in patch 1&2. * Other patches are not changed. Haozhong Zhang (7): [M ] 1/7 x86/domctl: generalize the restore of vMCE parameters [ M ] 2/7 x86/vmce: emulate MSR_IA32_MCG_EXT_CTL [ R ] 3/7 x86/vmce: enable injecting LMCE to guest on Intel host [ RA] 4/7 x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP [ R ] 5/7 xen/mce: add support of vLMCE injection to XEN_MC_inject_v2 [ A] 6/7 tools/libxc: add support of injecting MC# to specified CPUs [ A] 7/7 tools/xen-mceinj: add support of injecting LMCE N: new in this version M: modified in this version R: got R-b A: got A-b docs/man/xl.cfg.pod.5.in| 24 + tools/libxc/include/xenctrl.h | 2 ++ tools/libxc/xc_misc.c | 52 ++- tools/libxc/xc_sr_save_x86_hvm.c| 1 + tools/libxl/libxl.h | 7 tools/libxl/libxl_dom.c | 15 tools/libxl/libxl_types.idl | 1 + tools/tests/mce-test/tools/xen-mceinj.c | 50 -- tools/xl/xl_parse.c | 31 ++-- xen/arch/x86/cpu/mcheck/mcaction.c | 23 xen/arch/x86/cpu/mcheck/mce.c | 24 - xen/arch/x86/cpu/mcheck/mce.h | 1 + xen/arch/x86/cpu/mcheck/mce_intel.c | 2 +- xen/arch/x86/cpu/mcheck/vmce.c | 64 +++-- xen/arch/x86/cpu/mcheck/vmce.h | 2 +- xen/arch/x86/domctl.c | 56 - xen/arch/x86/hvm/hvm.c | 5 +++ xen/include/asm-x86/mce.h | 2 ++ xen/include/public/arch-x86/hvm/save.h | 1 + xen/include/public/arch-x86/xen-mca.h | 1 + xen/include/public/hvm/params.h | 7 +++- 21 files changed, 335 insertions(+), 36 deletions(-) -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 1/7] x86/domctl: generalize the restore of vMCE parameters
vMCE parameters in struct xen_domctl_ext_vcpucontext were extended in the past, and is likely to be extended in the future. When migrating a PV domain from old Xen, XEN_DOMCTL_set_ext_vcpucontext should handle the differences. Instead of adding ad-hoc handling code at each extension, we introduce an array to record sizes of the current and all past versions of vMCE parameters, and search for the largest one that does not expire the size of passed-in parameters to determine vMCE parameters that will be restored. If vMCE parameters are extended in the future, we only need to adapt the array to reflect the extension. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Changes in v8: * Rename valid_vmce_size[] tp valid_sizes[]. * Use offsetof() + sizeof() in valid_sizes[] and macroize it. * Remove element 0 from valid_sizes[]. * int i --> unsigned int i * Leave a blank line before the ending return. --- xen/arch/x86/domctl.c | 54 +++ 1 file changed, 37 insertions(+), 17 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 7fa58b49af..125537b96d 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -302,6 +302,42 @@ static int update_domain_cpuid_info(struct domain *d, return 0; } +static int vcpu_set_vmce(struct vcpu *v, + const struct xen_domctl_ext_vcpucontext *evc) +{ +/* + * Sizes of vMCE parameters used by the current and past versions + * of Xen in descending order. If vMCE parameters are extended, + * remember to add the old size to this array by VMCE_SIZE(). + */ +#define VMCE_SIZE(param) \ +(offsetof(typeof(evc->vmce), param) + sizeof(evc->vmce.param)) + +static const unsigned int valid_sizes[] = { +sizeof(evc->vmce), +VMCE_SIZE(caps), +}; +#undef VMCE_SIZE + +struct hvm_vmce_vcpu vmce = { }; +unsigned int evc_vmce_size = evc->size - offsetof(typeof(*evc), mcg_cap); +unsigned int i = 0; + +BUILD_BUG_ON(offsetof(typeof(*evc), mcg_cap) != + offsetof(typeof(*evc), vmce.caps)); +BUILD_BUG_ON(sizeof(evc->mcg_cap) != sizeof(evc->vmce.caps)); + +while ( i < ARRAY_SIZE(valid_sizes) && evc_vmce_size < valid_sizes[i] ) +++i; + +if ( i == ARRAY_SIZE(valid_sizes) ) +return 0; + +memcpy(, >vmce, valid_sizes[i]); + +return vmce_restore_vcpu(v, ); +} + void arch_get_domain_info(const struct domain *d, struct xen_domctl_getdomaininfo *info) { @@ -912,23 +948,7 @@ long arch_do_domctl( else domain_pause(d); -BUILD_BUG_ON(offsetof(struct xen_domctl_ext_vcpucontext, - mcg_cap) != - offsetof(struct xen_domctl_ext_vcpucontext, - vmce.caps)); -BUILD_BUG_ON(sizeof(evc->mcg_cap) != sizeof(evc->vmce.caps)); -if ( evc->size >= offsetof(typeof(*evc), vmce) + - sizeof(evc->vmce) ) -ret = vmce_restore_vcpu(v, >vmce); -else if ( evc->size >= offsetof(typeof(*evc), mcg_cap) + - sizeof(evc->mcg_cap) ) -{ -struct hvm_vmce_vcpu vmce = { .caps = evc->mcg_cap }; - -ret = vmce_restore_vcpu(v, ); -} -else -ret = 0; +ret = vcpu_set_vmce(v, evc); domain_unpause(d); } -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 2/7] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, then allow guest to read/write MSR_IA32_MCG_EXT_CTL. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Changes in v8: * Use offsetof() + sizeof() (VMCE_SIZE()) in valid_sizes[]. --- xen/arch/x86/cpu/mcheck/vmce.c | 34 +- xen/arch/x86/domctl.c | 2 ++ xen/include/asm-x86/mce.h | 1 + xen/include/public/arch-x86/hvm/save.h | 1 + 4 files changed, 37 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c index 1356f611ab..060e2d0582 100644 --- a/xen/arch/x86/cpu/mcheck/vmce.c +++ b/xen/arch/x86/cpu/mcheck/vmce.c @@ -91,6 +91,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct hvm_vmce_vcpu *ctxt) v->arch.vmce.mcg_cap = ctxt->caps; v->arch.vmce.bank[0].mci_ctl2 = ctxt->mci_ctl2_bank0; v->arch.vmce.bank[1].mci_ctl2 = ctxt->mci_ctl2_bank1; +v->arch.vmce.mcg_ext_ctl = ctxt->mcg_ext_ctl; return 0; } @@ -200,6 +201,26 @@ int vmce_rdmsr(uint32_t msr, uint64_t *val) mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_CTL %#"PRIx64"\n", cur, *val); break; +case MSR_IA32_MCG_EXT_CTL: +/* + * If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, the LMCE and LOCK + * bits are always set in guest MSR_IA32_FEATURE_CONTROL by Xen, so it + * does not need to check them here. + */ +if ( cur->arch.vmce.mcg_cap & MCG_LMCE_P ) +{ +*val = cur->arch.vmce.mcg_ext_ctl; +mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL %#"PRIx64"\n", + cur, *val); +} +else +{ +ret = -1; +mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL, not supported\n", + cur); +} +break; + default: ret = mce_bank_msr(cur, msr) ? bank_mce_rdmsr(cur, msr, val) : 0; break; @@ -309,6 +330,16 @@ int vmce_wrmsr(uint32_t msr, uint64_t val) mce_printk(MCE_VERBOSE, "MCE: %pv: MCG_CAP is r/o\n", cur); break; +case MSR_IA32_MCG_EXT_CTL: +if ( (cur->arch.vmce.mcg_cap & MCG_LMCE_P) && + !(val & ~MCG_EXT_CTL_LMCE_EN) ) +cur->arch.vmce.mcg_ext_ctl = val; +else +ret = -1; +mce_printk(MCE_VERBOSE, "MCE: %pv: wr MCG_EXT_CTL %"PRIx64"%s\n", + cur, val, (ret == -1) ? ", not supported" : ""); +break; + default: ret = mce_bank_msr(cur, msr) ? bank_mce_wrmsr(cur, msr, val) : 0; break; @@ -327,7 +358,8 @@ static int vmce_save_vcpu_ctxt(struct domain *d, hvm_domain_context_t *h) struct hvm_vmce_vcpu ctxt = { .caps = v->arch.vmce.mcg_cap, .mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2, -.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2 +.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2, +.mcg_ext_ctl = v->arch.vmce.mcg_ext_ctl, }; err = hvm_save_entry(VMCE_VCPU, v->vcpu_id, h, ); diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 125537b96d..5f8b5a5629 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -315,6 +315,7 @@ static int vcpu_set_vmce(struct vcpu *v, static const unsigned int valid_sizes[] = { sizeof(evc->vmce), +VMCE_SIZE(mci_ctl2_bank1), VMCE_SIZE(caps), }; #undef VMCE_SIZE @@ -907,6 +908,7 @@ long arch_do_domctl( evc->vmce.caps = v->arch.vmce.mcg_cap; evc->vmce.mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2; evc->vmce.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2; +evc->vmce.mcg_ext_ctl = v->arch.vmce.mcg_ext_ctl; ret = 0; vcpu_unpause(v); diff --git a/xen/include/asm-x86/mce.h b/xen/include/asm-x86/mce.h index 56ad1f92dd..35f9962638 100644 --- a/xen/include/asm-x86/mce.h +++ b/xen/include/asm-x86/mce.h @@ -27,6 +27,7 @@ struct vmce_bank { struct vmce { uint64_t mcg_cap; uint64_t mcg_status; +uint64_t mcg_ext_ctl; spinlock_t lock; struct vmce_bank bank[GUEST_MC_BANK_NUM]; }; diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h index 816973b9c2..fd7bf3fb38 100644 --- a/xen/include/public/arch-x86/hvm/save.h +++ b/xen/include/public/arch-x86/hvm/save.h @@ -610,6 +610,7 @@ struct hvm_vmce_vcpu { uint64_t caps; uint64_t mci_ctl2_bank0; uint64_t mci_ctl2_bank1; +uint64_t mcg_ext_ctl; }; DECLARE_HVM_SAVE_TYPE(VMCE_VCPU, 18, struct hvm_vmce_vcpu); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 4/7] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP. By default, LMCE is not exposed to guest so as to keep the backwards migration compatibility. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> for hypervisor side Acked-by: Wei Liu <wei.l...@citrix.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- docs/man/xl.cfg.pod.5.in| 24 tools/libxc/xc_sr_save_x86_hvm.c| 1 + tools/libxl/libxl.h | 7 +++ tools/libxl/libxl_dom.c | 15 +++ tools/libxl/libxl_types.idl | 1 + tools/xl/xl_parse.c | 31 +-- xen/arch/x86/cpu/mcheck/mce.h | 1 + xen/arch/x86/cpu/mcheck/mce_intel.c | 2 +- xen/arch/x86/cpu/mcheck/vmce.c | 19 ++- xen/arch/x86/hvm/hvm.c | 5 + xen/include/asm-x86/mce.h | 1 + xen/include/public/hvm/params.h | 7 ++- 12 files changed, 109 insertions(+), 5 deletions(-) diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in index ff3203550f..79cb2eaea7 100644 --- a/docs/man/xl.cfg.pod.5.in +++ b/docs/man/xl.cfg.pod.5.in @@ -2173,6 +2173,30 @@ natively or via hardware backwards compatibility support. =back +=head3 x86 + +=over 4 + +=item B
[Xen-devel] [PATCH v8 3/7] x86/vmce: enable injecting LMCE to guest on Intel host
Inject LMCE to guest if the host MCE is LMCE and the affected vcpu is known. Otherwise, broadcast MCE to all vcpus on Intel host. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/cpu/mcheck/mcaction.c | 23 --- xen/arch/x86/cpu/mcheck/vmce.c | 11 ++- xen/arch/x86/cpu/mcheck/vmce.h | 2 +- 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c index ca17d22bd8..f959bed2cb 100644 --- a/xen/arch/x86/cpu/mcheck/mcaction.c +++ b/xen/arch/x86/cpu/mcheck/mcaction.c @@ -44,6 +44,7 @@ mc_memerr_dhandler(struct mca_binfo *binfo, unsigned long mfn, gfn; uint32_t status; int vmce_vcpuid; +unsigned int mc_vcpuid; if (!mc_check_addr(bank->mc_status, bank->mc_misc, MC_ADDR_PHYSICAL)) { dprintk(XENLOG_WARNING, @@ -88,18 +89,26 @@ mc_memerr_dhandler(struct mca_binfo *binfo, goto vmce_failed; } -if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL || -global->mc_vcpuid == XEN_MC_VCPUID_INVALID) +mc_vcpuid = global->mc_vcpuid; +if (mc_vcpuid == XEN_MC_VCPUID_INVALID || +/* + * Because MC# may happen asynchronously with the actual + * operation that triggers the error, the domain ID as + * well as the vCPU ID collected in 'global' at MC# are + * not always precise. In that case, fallback to broadcast. + */ +global->mc_domid != bank->mc_domid || +(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && + (!(global->mc_gstatus & MCG_STATUS_LMCE) || + !(d->vcpu[mc_vcpuid]->arch.vmce.mcg_ext_ctl & +MCG_EXT_CTL_LMCE_EN vmce_vcpuid = VMCE_INJECT_BROADCAST; else -vmce_vcpuid = global->mc_vcpuid; +vmce_vcpuid = mc_vcpuid; bank->mc_addr = gfn << PAGE_SHIFT | (bank->mc_addr & (PAGE_SIZE -1 )); -/* TODO: support injecting LMCE */ -if (fill_vmsr_data(bank, d, - global->mc_gstatus & ~MCG_STATUS_LMCE, - vmce_vcpuid == VMCE_INJECT_BROADCAST)) +if (fill_vmsr_data(bank, d, global->mc_gstatus, vmce_vcpuid)) { mce_printk(MCE_QUIET, "Fill vMCE# data for DOM%d " "failed\n", bank->mc_domid); diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c index 060e2d0582..e2b3c5b8cc 100644 --- a/xen/arch/x86/cpu/mcheck/vmce.c +++ b/xen/arch/x86/cpu/mcheck/vmce.c @@ -465,14 +465,23 @@ static int vcpu_fill_mc_msrs(struct vcpu *v, uint64_t mcg_status, } int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d, - uint64_t gstatus, bool broadcast) + uint64_t gstatus, int vmce_vcpuid) { struct vcpu *v = d->vcpu[0]; +bool broadcast = (vmce_vcpuid == VMCE_INJECT_BROADCAST); int ret, err; if ( mc_bank->mc_domid == DOMID_INVALID ) return -EINVAL; +if ( broadcast ) +gstatus &= ~MCG_STATUS_LMCE; +else if ( gstatus & MCG_STATUS_LMCE ) +{ +ASSERT(vmce_vcpuid >= 0 && vmce_vcpuid < d->max_vcpus); +v = d->vcpu[vmce_vcpuid]; +} + /* * vMCE with the actual error information is injected to vCPU0, * and, if broadcast is required, we choose to inject less severe diff --git a/xen/arch/x86/cpu/mcheck/vmce.h b/xen/arch/x86/cpu/mcheck/vmce.h index 74f6381460..2797e00275 100644 --- a/xen/arch/x86/cpu/mcheck/vmce.h +++ b/xen/arch/x86/cpu/mcheck/vmce.h @@ -17,7 +17,7 @@ int vmce_amd_rdmsr(const struct vcpu *, uint32_t msr, uint64_t *val); int vmce_amd_wrmsr(struct vcpu *, uint32_t msr, uint64_t val); int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d, - uint64_t gstatus, bool broadcast); + uint64_t gstatus, int vmce_vcpuid); #define VMCE_INJECT_BROADCAST (-1) int inject_vmce(struct domain *d, int vcpu); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 6/7] tools/libxc: add support of injecting MC# to specified CPUs
Though XEN_MC_inject_v2 allows injecting MC# to specified CPUs, the current xc_mca_op() does not use this feature and not provide an interface to callers. This commit add a new xc_mca_op_inject_v2() that receives a cpumap providing the set of target CPUs. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Acked-by: Wei Liu <wei.l...@citrix.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/libxc/include/xenctrl.h | 2 ++ tools/libxc/xc_misc.c | 52 ++- 2 files changed, 53 insertions(+), 1 deletion(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index c51bb3b448..552a4fd47d 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1809,6 +1809,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, void xc_cpuid_to_str(const unsigned int *regs, char **strs); /* some strs[] may be NULL if ENOMEM */ int xc_mca_op(xc_interface *xch, struct xen_mc *mc); +int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags, +xc_cpumap_t cpumap, unsigned int nr_cpus); #endif struct xc_px_val { diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index 88084fde30..2303293c6c 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -341,7 +341,57 @@ int xc_mca_op(xc_interface *xch, struct xen_mc *mc) xc_hypercall_bounce_post(xch, mc); return ret; } -#endif + +int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags, +xc_cpumap_t cpumap, unsigned int nr_bits) +{ +int ret = -1; +struct xen_mc mc_buf, *mc = _buf; +struct xen_mc_inject_v2 *inject = >u.mc_inject_v2; + +DECLARE_HYPERCALL_BOUNCE(cpumap, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN); +DECLARE_HYPERCALL_BOUNCE(mc, sizeof(*mc), XC_HYPERCALL_BUFFER_BOUNCE_BOTH); + +memset(mc, 0, sizeof(*mc)); + +if ( cpumap ) +{ +if ( !nr_bits ) +{ +errno = EINVAL; +goto out; +} + +HYPERCALL_BOUNCE_SET_SIZE(cpumap, (nr_bits + 7) / 8); +if ( xc_hypercall_bounce_pre(xch, cpumap) ) +{ +PERROR("Could not bounce cpumap memory buffer"); +goto out; +} +set_xen_guest_handle(inject->cpumap.bitmap, cpumap); +inject->cpumap.nr_bits = nr_bits; +} + +inject->flags = flags; +mc->cmd = XEN_MC_inject_v2; +mc->interface_version = XEN_MCA_INTERFACE_VERSION; + +if ( xc_hypercall_bounce_pre(xch, mc) ) +{ +PERROR("Could not bounce xen_mc memory buffer"); +goto out_free_cpumap; +} + +ret = xencall1(xch->xcall, __HYPERVISOR_mca, HYPERCALL_BUFFER_AS_ARG(mc)); + +xc_hypercall_bounce_post(xch, mc); +out_free_cpumap: +if ( cpumap ) +xc_hypercall_bounce_post(xch, cpumap); +out: +return ret; +} +#endif /* __i386__ || __x86_64__ */ int xc_perfc_reset(xc_interface *xch) { -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 5/7] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- xen/arch/x86/cpu/mcheck/mce.c | 24 +++- xen/include/public/arch-x86/xen-mca.h | 1 + 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c index ee04fb54ff..30525dd78b 100644 --- a/xen/arch/x86/cpu/mcheck/mce.c +++ b/xen/arch/x86/cpu/mcheck/mce.c @@ -1485,11 +1485,12 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc) { const cpumask_t *cpumap; cpumask_var_t cmv; +bool broadcast = op->u.mc_inject_v2.flags & XEN_MC_INJECT_CPU_BROADCAST; if (nr_mce_banks == 0) return x86_mcerr("do_mca #MC", -ENODEV); -if ( op->u.mc_inject_v2.flags & XEN_MC_INJECT_CPU_BROADCAST ) +if ( broadcast ) cpumap = _online_map; else { @@ -1529,6 +1530,27 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc) } break; +case XEN_MC_INJECT_TYPE_LMCE: +if ( !lmce_support ) +{ +ret = x86_mcerr("No LMCE support", -EINVAL); +break; +} +if ( broadcast ) +{ +ret = x86_mcerr("Broadcast cannot be used with LMCE", -EINVAL); +break; +} +/* Ensure at most one CPU is specified. */ +if ( nr_cpu_ids > cpumask_next(cpumask_first(cpumap), cpumap) ) +{ +ret = x86_mcerr("More than one CPU specified for LMCE", +-EINVAL); +break; +} +on_selected_cpus(cpumap, x86_mc_mceinject, NULL, 1); +break; + default: ret = x86_mcerr("Wrong mca type\n", -EINVAL); break; diff --git a/xen/include/public/arch-x86/xen-mca.h b/xen/include/public/arch-x86/xen-mca.h index 7db990723b..dc35267249 100644 --- a/xen/include/public/arch-x86/xen-mca.h +++ b/xen/include/public/arch-x86/xen-mca.h @@ -414,6 +414,7 @@ struct xen_mc_mceinject { #define XEN_MC_INJECT_TYPE_MASK 0x7 #define XEN_MC_INJECT_TYPE_MCE 0x0 #define XEN_MC_INJECT_TYPE_CMCI 0x1 +#define XEN_MC_INJECT_TYPE_LMCE 0x2 #define XEN_MC_INJECT_CPU_BROADCAST 0x8 -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 7/7] tools/xen-mceinj: add support of injecting LMCE
If option '-l' or '--lmce' is specified and the host supports LMCE, xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c' is not present). Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Acked-by: Wei Liu <wei.l...@citrix.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/tests/mce-test/tools/xen-mceinj.c | 50 +++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/tools/tests/mce-test/tools/xen-mceinj.c b/tools/tests/mce-test/tools/xen-mceinj.c index bae5a46eb5..380e42190c 100644 --- a/tools/tests/mce-test/tools/xen-mceinj.c +++ b/tools/tests/mce-test/tools/xen-mceinj.c @@ -56,6 +56,8 @@ #define MSR_IA32_MC0_MISC0x0403 #define MSR_IA32_MC0_CTL20x0280 +#define MCG_STATUS_LMCE 0x8 + struct mce_info { const char *description; uint8_t mcg_stat; @@ -113,6 +115,7 @@ static struct mce_info mce_table[] = { #define LOGFILE stdout int dump; +int lmce; struct xen_mc_msrinject msr_inj; static void Lprintf(const char *fmt, ...) @@ -212,6 +215,35 @@ static int inject_mce(xc_interface *xc_handle, int cpu_nr) return xc_mca_op(xc_handle, ); } +static int inject_lmce(xc_interface *xc_handle, unsigned int cpu) +{ +uint8_t *cpumap = NULL; +size_t cpumap_size, line, shift; +unsigned int nr_cpus; +int ret; + +nr_cpus = mca_cpuinfo(xc_handle); +if ( !nr_cpus ) +err(xc_handle, "Failed to get mca_cpuinfo"); +if ( cpu >= nr_cpus ) +err(xc_handle, "-c %u is larger than %u", cpu, nr_cpus - 1); + +cpumap_size = (nr_cpus + 7) / 8; +cpumap = malloc(cpumap_size); +if ( !cpumap ) +err(xc_handle, "Failed to allocate cpumap\n"); +memset(cpumap, 0, cpumap_size); +line = cpu / 8; +shift = cpu % 8; +memset(cpumap + line, 1 << shift, 1); + +ret = xc_mca_op_inject_v2(xc_handle, XEN_MC_INJECT_TYPE_LMCE, + cpumap, cpumap_size * 8); + +free(cpumap); +return ret; +} + static uint64_t bank_addr(int bank, int type) { uint64_t addr; @@ -330,8 +362,15 @@ static int inject(xc_interface *xc_handle, struct mce_info *mce, uint32_t cpu_nr, uint32_t domain, uint64_t gaddr) { int ret = 0; +uint8_t mcg_status = mce->mcg_stat; -ret = inject_mcg_status(xc_handle, cpu_nr, mce->mcg_stat, domain); +if ( lmce ) +{ +if ( mce->cmci ) +err(xc_handle, "No support to inject CMCI as LMCE"); +mcg_status |= MCG_STATUS_LMCE; +} +ret = inject_mcg_status(xc_handle, cpu_nr, mcg_status, domain); if ( ret ) err(xc_handle, "Failed to inject MCG_STATUS MSR"); @@ -354,6 +393,8 @@ static int inject(xc_interface *xc_handle, struct mce_info *mce, err(xc_handle, "Failed to inject MSR"); if ( mce->cmci ) ret = inject_cmci(xc_handle, cpu_nr); +else if ( lmce ) +ret = inject_lmce(xc_handle, cpu_nr); else ret = inject_mce(xc_handle, cpu_nr); if ( ret ) @@ -393,6 +434,7 @@ static struct option opts[] = { {"dump", 0, 0, 'D'}, {"help", 0, 0, 'h'}, {"page", 0, 0, 'p'}, +{"lmce", 0, 0, 'l'}, {"", 0, 0, '\0'} }; @@ -409,6 +451,7 @@ static void help(void) " -d, --domain=DOMID target domain, the default is Xen itself\n" " -h, --help print this page\n" " -p, --page=ADDR physical address to report\n" + " -l, --lmce inject as LMCE (Intel only)\n" " -t, --type=ERROR error type\n"); for ( i = 0; i < MCE_TABLE_SIZE; i++ ) @@ -438,7 +481,7 @@ int main(int argc, char *argv[]) } while ( 1 ) { -c = getopt_long(argc, argv, "c:Dd:t:hp:", opts, _index); +c = getopt_long(argc, argv, "c:Dd:t:hp:l", opts, _index); if ( c == -1 ) break; switch ( c ) { @@ -463,6 +506,9 @@ int main(int argc, char *argv[]) case 't': type = strtol(optarg, NULL, 0); break; +case 'l': +lmce = 1; +break; case 'h': default: help(); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v7 4/7] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP. By default, LMCE is not exposed to guest so as to keep the backwards migration compatibility. Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> for hypervisor side Acked-by: Wei Liu <wei.l...@citrix.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Jan Beulich <jbeul...@suse.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> --- docs/man/xl.cfg.pod.5.in| 24 tools/libxc/xc_sr_save_x86_hvm.c| 1 + tools/libxl/libxl.h | 7 +++ tools/libxl/libxl_dom.c | 15 +++ tools/libxl/libxl_types.idl | 1 + tools/xl/xl_parse.c | 31 +-- xen/arch/x86/cpu/mcheck/mce.h | 1 + xen/arch/x86/cpu/mcheck/mce_intel.c | 2 +- xen/arch/x86/cpu/mcheck/vmce.c | 19 ++- xen/arch/x86/hvm/hvm.c | 5 + xen/include/asm-x86/mce.h | 1 + xen/include/public/hvm/params.h | 7 ++- 12 files changed, 109 insertions(+), 5 deletions(-) diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in index ff3203550f..79cb2eaea7 100644 --- a/docs/man/xl.cfg.pod.5.in +++ b/docs/man/xl.cfg.pod.5.in @@ -2173,6 +2173,30 @@ natively or via hardware backwards compatibility support. =back +=head3 x86 + +=over 4 + +=item B
[Xen-devel] [PATCH v7 0/7] Add LMCE support
v7 is based on staging branch and only contains the remaining patches. Changes in v7: * (Patch 1) Introduce a general way to restore vMCE parameters. * (Patch 2) Adapt to the change in patch 1. * Other patch 3 - 7 remain the same as v5 patch 7 - 11. Haozhong Zhang (7): [N ] 1/7 x86/domctl: generalize the restore of vMCE parameters [ M ] 2/7 x86/vmce: emulate MSR_IA32_MCG_EXT_CTL [ R ] 3/7 x86/vmce: enable injecting LMCE to guest on Intel host [ RA] 4/7 x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP [ R ] 5/7 xen/mce: add support of vLMCE injection to XEN_MC_inject_v2 [ A] 6/7 tools/libxc: add support of injecting MC# to specified CPUs [ A] 7/7 tools/xen-mceinj: add support of injecting LMCE N: new in this version M: modified in this version R: got R-b A: got A-b docs/man/xl.cfg.pod.5.in| 24 + tools/libxc/include/xenctrl.h | 2 ++ tools/libxc/xc_misc.c | 52 ++- tools/libxc/xc_sr_save_x86_hvm.c| 1 + tools/libxl/libxl.h | 7 tools/libxl/libxl_dom.c | 15 tools/libxl/libxl_types.idl | 1 + tools/tests/mce-test/tools/xen-mceinj.c | 50 -- tools/xl/xl_parse.c | 31 ++-- xen/arch/x86/cpu/mcheck/mcaction.c | 23 xen/arch/x86/cpu/mcheck/mce.c | 24 - xen/arch/x86/cpu/mcheck/mce.h | 1 + xen/arch/x86/cpu/mcheck/mce_intel.c | 2 +- xen/arch/x86/cpu/mcheck/vmce.c | 64 +++-- xen/arch/x86/cpu/mcheck/vmce.h | 2 +- xen/arch/x86/domctl.c | 53 ++- xen/arch/x86/hvm/hvm.c | 5 +++ xen/include/asm-x86/mce.h | 2 ++ xen/include/public/arch-x86/hvm/save.h | 1 + xen/include/public/arch-x86/xen-mca.h | 1 + xen/include/public/hvm/params.h | 7 +++- 21 files changed, 332 insertions(+), 36 deletions(-) -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v7 7/7] tools/xen-mceinj: add support of injecting LMCE
If option '-l' or '--lmce' is specified and the host supports LMCE, xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c' is not present). Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> Acked-by: Wei Liu <wei.l...@citrix.com> --- Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> --- tools/tests/mce-test/tools/xen-mceinj.c | 50 +++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/tools/tests/mce-test/tools/xen-mceinj.c b/tools/tests/mce-test/tools/xen-mceinj.c index bae5a46eb5..380e42190c 100644 --- a/tools/tests/mce-test/tools/xen-mceinj.c +++ b/tools/tests/mce-test/tools/xen-mceinj.c @@ -56,6 +56,8 @@ #define MSR_IA32_MC0_MISC0x0403 #define MSR_IA32_MC0_CTL20x0280 +#define MCG_STATUS_LMCE 0x8 + struct mce_info { const char *description; uint8_t mcg_stat; @@ -113,6 +115,7 @@ static struct mce_info mce_table[] = { #define LOGFILE stdout int dump; +int lmce; struct xen_mc_msrinject msr_inj; static void Lprintf(const char *fmt, ...) @@ -212,6 +215,35 @@ static int inject_mce(xc_interface *xc_handle, int cpu_nr) return xc_mca_op(xc_handle, ); } +static int inject_lmce(xc_interface *xc_handle, unsigned int cpu) +{ +uint8_t *cpumap = NULL; +size_t cpumap_size, line, shift; +unsigned int nr_cpus; +int ret; + +nr_cpus = mca_cpuinfo(xc_handle); +if ( !nr_cpus ) +err(xc_handle, "Failed to get mca_cpuinfo"); +if ( cpu >= nr_cpus ) +err(xc_handle, "-c %u is larger than %u", cpu, nr_cpus - 1); + +cpumap_size = (nr_cpus + 7) / 8; +cpumap = malloc(cpumap_size); +if ( !cpumap ) +err(xc_handle, "Failed to allocate cpumap\n"); +memset(cpumap, 0, cpumap_size); +line = cpu / 8; +shift = cpu % 8; +memset(cpumap + line, 1 << shift, 1); + +ret = xc_mca_op_inject_v2(xc_handle, XEN_MC_INJECT_TYPE_LMCE, + cpumap, cpumap_size * 8); + +free(cpumap); +return ret; +} + static uint64_t bank_addr(int bank, int type) { uint64_t addr; @@ -330,8 +362,15 @@ static int inject(xc_interface *xc_handle, struct mce_info *mce, uint32_t cpu_nr, uint32_t domain, uint64_t gaddr) { int ret = 0; +uint8_t mcg_status = mce->mcg_stat; -ret = inject_mcg_status(xc_handle, cpu_nr, mce->mcg_stat, domain); +if ( lmce ) +{ +if ( mce->cmci ) +err(xc_handle, "No support to inject CMCI as LMCE"); +mcg_status |= MCG_STATUS_LMCE; +} +ret = inject_mcg_status(xc_handle, cpu_nr, mcg_status, domain); if ( ret ) err(xc_handle, "Failed to inject MCG_STATUS MSR"); @@ -354,6 +393,8 @@ static int inject(xc_interface *xc_handle, struct mce_info *mce, err(xc_handle, "Failed to inject MSR"); if ( mce->cmci ) ret = inject_cmci(xc_handle, cpu_nr); +else if ( lmce ) +ret = inject_lmce(xc_handle, cpu_nr); else ret = inject_mce(xc_handle, cpu_nr); if ( ret ) @@ -393,6 +434,7 @@ static struct option opts[] = { {"dump", 0, 0, 'D'}, {"help", 0, 0, 'h'}, {"page", 0, 0, 'p'}, +{"lmce", 0, 0, 'l'}, {"", 0, 0, '\0'} }; @@ -409,6 +451,7 @@ static void help(void) " -d, --domain=DOMID target domain, the default is Xen itself\n" " -h, --help print this page\n" " -p, --page=ADDR physical address to report\n" + " -l, --lmce inject as LMCE (Intel only)\n" " -t, --type=ERROR error type\n"); for ( i = 0; i < MCE_TABLE_SIZE; i++ ) @@ -438,7 +481,7 @@ int main(int argc, char *argv[]) } while ( 1 ) { -c = getopt_long(argc, argv, "c:Dd:t:hp:", opts, _index); +c = getopt_long(argc, argv, "c:Dd:t:hp:l", opts, _index); if ( c == -1 ) break; switch ( c ) { @@ -463,6 +506,9 @@ int main(int argc, char *argv[]) case 't': type = strtol(optarg, NULL, 0); break; +case 'l': +lmce = 1; +break; case 'h': default: help(); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel