Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash

2017-11-08 Thread Haozhong Zhang
On 11/07/17 01:37 -0700, Jan Beulich wrote:
> >>> On 07.11.17 at 09:23,  wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Tuesday, November 7, 2017 4:09 PM
> >> >>> On 07.11.17 at 02:37,  wrote:
> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> >> Sent: Monday, November 6, 2017 5:17 PM
> >> >> >>> On 03.11.17 at 09:29,  wrote:
> >> >> > We figured out the problem, some corner scripts triggered the error
> >> >> > injection at the same page (pfn 0x180020) twice, i.e. "./xen-mceinj
> >> >> > -t 0" run over one time, which resulted in Dom0 crash.
> >> >>
> >> >> But isn't this a valid scenario, which shouldn't result in a kernel 
> >> >> crash?
> >> > What if
> >> >> two successive #MCs occurred for the same page?
> >> >> I.e. ...
> >> >>
> >> >
> >> > Yes, it's another valid scenario, the expect result is kernel crash.
> >> 
> >> Kernel _crash_ or rather kernel _panic_? Of course without any kernel 
> >> messages
> >> we can't tell one from the other, but to me this makes a difference 
> >> nevertheless.
> >> 
> > Exactly, Dom0 crash.
> 
> I don't believe a crash is the expected outcome here.
>

This test case injects two errors to the same dom0 page. During the
first injection, offline_page() is called to set PGC_broken flag of
that page. During the second injection, offline_page() detects the
same broken page is touched again, and then tries to shutdown the page
owner, i.e. dom0 in this case:

/*
 * NB. When broken page belong to guest, usually hypervisor will
 * notify the guest to handle the broken page. However, hypervisor
 * need to prevent malicious guest access the broken page again.
 * Under such case, hypervisor shutdown guest, preventing recursive mce.
 */
if ( (pg->count_info & PGC_broken) && (owner = page_get_owner(pg)) )
{
*status = PG_OFFLINE_AGAIN;
domain_shutdown(owner, SHUTDOWN_crash);
return 0;
}

So I think Dom0 crash and the following machine reboot are the
expected behaviors here.

But, it looks a (unexpected) page fault happens during the reboot.
Xudong, can you check whether a normal reboot on that machine triggers
a page fault?

> > And I didn't see any "kernel panic" message from the log -- attach the 
> > original log again.
> 
> Well, as said - there _no_ kernel log message at all, and hence we
> can't tell whether it's a crash or a plain panic. Iirc Xen's "Hardware
> Dom0 crashed" can't distinguish the two cases.
> 

The crash is triggered in offline_page() before Xen can inject the
error to Dom0, so there is no dom0 kernel log around the crash.

This can be confirmed by dumping the call trace when
hwdom_shutdown(SHUTDOWN_crash) is called. Xudong, can you do this?

Thanks,
Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op

2017-11-03 Thread Haozhong Zhang
On 11/03/17 15:40 +0800, Chao Peng wrote:
> 
> > +/*
> > + * Interface for NVDIMM management.
> > + */
> > +
> > +struct xen_sysctl_nvdimm_op {
> > +uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented
> > yet. */
> > +uint32_t pad; /* IN: Always zero. */
> 
> If alignment is the only concern, then err can be moved to here.
> 
> If it's designed for future and does not get used now, then it's better
> to check its value explicitly.
> 

I'll move 'err' to the position of 'pad'.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0

2017-11-03 Thread Haozhong Zhang
On 11/03/17 14:51 +0800, Chao Peng wrote:
> On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> > ... to avoid the inference with the PMEM driver and management
> > utilities in Dom0.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
> > ---
> > Cc: Jan Beulich <jbeul...@suse.com>
> > Cc: Andrew Cooper <andrew.coop...@citrix.com>
> > Cc: Gang Wei <gang@intel.com>
> > Cc: Shane Wang <shane.w...@intel.com>
> > ---
> >  xen/arch/x86/acpi/power.c |  7 +++
> >  xen/arch/x86/dom0_build.c |  5 +
> >  xen/arch/x86/shutdown.c   |  3 +++
> >  xen/arch/x86/tboot.c  |  4 
> >  xen/common/kexec.c|  3 +++
> >  xen/common/pmem.c | 21 +
> >  xen/drivers/acpi/nfit.c   | 21 +
> >  xen/include/xen/acpi.h|  2 ++
> >  xen/include/xen/pmem.h| 13 +
> >  9 files changed, 79 insertions(+)
> > 
> > diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
> > index 1e4e5680a7..d135715a49 100644
> > --- a/xen/arch/x86/acpi/power.c
> > +++ b/xen/arch/x86/acpi/power.c
> > @@ -178,6 +178,10 @@ static int enter_state(u32 state)
> >  
> >  freeze_domains();
> >  
> > +#ifdef CONFIG_NVDIMM_PMEM
> > +acpi_nfit_reinstate();
> > +#endif
> 
> I don't understand why reinstate is needed for NFIT table? Will it  be
> searched by firmware on shutdown / entering power state?

I added these acpi_nfit_reinstate()'s akin to acpi_dmar_reinstate().
There is not public documents stating NFIT is not rebuilt during power
state changes.

Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT

2017-11-03 Thread Haozhong Zhang
On 11/03/17 14:15 +0800, Chao Peng wrote:
> 
> > +static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc
> > *desc)
> > +{
> > +struct nfit_spa_desc *spa_desc;
> > +struct nfit_memdev_desc *memdev_desc;
> > +struct acpi_nfit_system_address *spa;
> > +unsigned long smfn, emfn;
> > +
> > +list_for_each_entry(memdev_desc, >memdev_list, link)
> > +{
> > +spa_desc = memdev_desc->spa_desc;
> > +
> > +if ( !spa_desc ||
> > + (memdev_desc->acpi_table->flags &
> > +  (ACPI_NFIT_MEM_SAVE_FAILED |
> > ACPI_NFIT_MEM_RESTORE_FAILED |
> > +   ACPI_NFIT_MEM_FLUSH_FAILED | ACPI_NFIT_MEM_NOT_ARMED |
> > +   ACPI_NFIT_MEM_MAP_FAILED)) )
> > +continue;
> 
> If failure is detected, is it reasonable to continue? We can print some
> messages at least I think.

I got something wrong here. I should iterate SPA structures, and check
all memdev in each SPA range. If any memdev contains failure flags,
then skip the whole SPA range and print an error message.

Haozhong

> 
> Chao
> > +
> > +spa = spa_desc->acpi_table;
> > +if ( memcmp(spa->range_guid, nfit_spa_pmem_guid, 16) )
> > +continue;
> > +smfn = paddr_to_pfn(spa->address);
> > +emfn = paddr_to_pfn(spa->address + spa->length);
> > +printk(XENLOG_INFO "NFIT: PMEM MFNs 0x%lx - 0x%lx\n", smfn,
> > emfn);
> > +}
> > +}

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable

2017-11-03 Thread Haozhong Zhang
On 11/03/17 13:58 +0800, Chao Peng wrote:
> 
> > +#ifdef CONFIG_NVDIMM_PMEM
> > +static void __init init_frametable_pmem_chunk(unsigned long s,
> > unsigned long e)
> > +{
> > +static unsigned long pmem_init_frametable_mfn;
> > +
> > +ASSERT(!((s | e) & (PAGE_SIZE - 1)));
> > +
> > +if ( !pmem_init_frametable_mfn )
> > +{
> > +pmem_init_frametable_mfn = alloc_boot_pages(1, 1);
> > +if ( !pmem_init_frametable_mfn )
> > +panic("Not enough memory for pmem initial frame table
> > page");
> > +memset(mfn_to_virt(pmem_init_frametable_mfn), -1, PAGE_SIZE);
> > +}
> 
> Can zero_page be used instead?

No. I intend to make the frametable entries for NVDIMM as invalid at
boot time, in order to avoid/detect accidental accesses to NVDIMM
pages before they are registered to Xen hypervisor later (by part 2
patches 14 - 25).

> 
> > +
> > +while ( s < e )
> > +{
> > +/*
> > + * The real frame table entries of a pmem region will be
> > + * created when the pmem region is registered to hypervisor.
> > + * Any write attempt to the initial entries of that pmem
> > + * region implies potential hypervisor bugs. In order to make
> > + * those bugs explicit, map those initial entries as read-
> > only.
> > + */
> > +map_pages_to_xen(s, pmem_init_frametable_mfn, 1,
> > PAGE_HYPERVISOR_RO);
> > +s += PAGE_SIZE;
> 
> Don't know how much the impact of 4K mapping on boot time when pmem is
> very large. Perhaps we need get such data on hardware.
>

Well, it will be very slow because the size of NVDIMM is usually very
large (e.g. from hundreds of giga-bytes to several tera-bytes). I can
make it to use huge page if possible.

> Another question is do we really need to map it, e.g. can we just skip
> the range here?

Sadly, I cannot remind why I did this. Maybe I can just leave the
frametable of NVDIMM unmapped and accidental access to them would just
trigger page fault in hypervisor, which can makes bugs explicit as well.


Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()

2017-10-27 Thread Haozhong Zhang
On 10/27/17 14:49 +0800, Chao Peng wrote:
> On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> > The current check refuses the hot-plugged memory that falls in one
> > unused PDX group, which should be allowed.
> 
> Looks reasonable to me. The only thing I can think of is you can double
> check if the following find_next_zero_bit/find_next_bit will still
> work. 

The first check in mem_hotadd_check() ensures spfn < epfn, so sidx <=
eidx here. Compared with the previous code, the only added case is
sidx == eidx, which is what this patch intends to allow and tested.

Haozhong

> 
> Chao
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
> > ---
> > Cc: Jan Beulich <jbeul...@suse.com>
> > Cc: Andrew Cooper <andrew.coop...@citrix.com>
> > ---
> >  xen/arch/x86/x86_64/mm.c | 6 +-
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> > index 11746730b4..6c5221f90c 100644
> > --- a/xen/arch/x86/x86_64/mm.c
> > +++ b/xen/arch/x86/x86_64/mm.c
> > @@ -1296,12 +1296,8 @@ static int mem_hotadd_check(unsigned long spfn,
> > unsigned long epfn)
> >  return 0;
> >  
> >  /* Make sure the new range is not present now */
> > -sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1)  &
> > ~(PDX_GROUP_COUNT - 1))
> > -/ PDX_GROUP_COUNT;
> > +sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) /
> > PDX_GROUP_COUNT;
> >  eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) /
> > PDX_GROUP_COUNT;
> > -if (sidx >= eidx)
> > -return 0;
> > -
> >  s = find_next_zero_bit(pdx_group_valid, eidx, sidx);
> >  if ( s > eidx )
> >  return 0;

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains

2017-10-26 Thread Haozhong Zhang
On 10/27/17 11:26 +0800, Chao Peng wrote:
> On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> > Overview
> > ==
> > 
> > > (RFC v2 can be found at https://lists.xen.org/archives/html/xen-
> devel/2017-03/msg02401.html)
> > 
> > Well, this RFC v3 changes and inflates a lot from previous versions.
> > The primary changes are listed below, most of which are to simplify
> > the first implementation and avoid additional inflation.
> > 
> > 1. Drop the support to maintain the frametable and M2P table of PMEM
> >    in RAM. In the future, we may add this support back.
> 
> I don't find any discussion in v2 about this, but I'm thinking putting
> those Xen data structures in RAM sometimes is useful (e.g. when
> performance is important). It's better not making hard restriction on
> this.

Well, this is to reduce the complexity, as you see the current patch
size is already too big. In addition, the size of NVDIMM can be very
large, e.g. several tera-bytes or even more, which would require a
large RAM space to store its frametable and M2P (~10 MB per 1 GB) and
leave fewer RAM for guest usage.

> 
> > 
> > 2. Hide host NFIT and deny access to host PMEM from Dom0. In other
> >    words, the kernel NVDIMM driver is loaded in Dom 0 and existing
> >    management utilities (e.g. ndctl) do not work in Dom0 anymore. This
> >    is to workaround the inferences of PMEM access between Dom0 and Xen
> >    hypervisor. In the future, we may add a stub driver in Dom0 which
> >    will hold the PMEM pages being used by Xen hypervisor and/or other
> >    domains.
> > 
> > 3. As there is no NVDIMM driver and management utilities in Dom0 now,
> > >    we cannot easily specify an area of host NVDIMM (e.g., by
> /dev/pmem0)
> >    and manage NVDIMM in Dom0 (e.g., creating labels).  Instead, we
> >    have to specify the exact MFNs of host PMEM pages in xl domain
> >    configuration files and the newly added Xen NVDIMM management
> >    utility xen-ndctl.
> > 
> >    If there are indeed some tasks that have to be handled by existing
> >    driver and management utilities, such as recovery from hardware
> >    failures, they have to be accomplished out of Xen environment.
> 
> What kind of recovery can happen and does the recovery can happen at
> runtime? For example, can we recover a portion of NVDIMM assigned to a
> certain VM while keep other VMs still using NVDIMM?

For example, evaluate ACPI _DSM (maybe vendor specific) for error
recovery and/or scrubbing bad blocks, etc.

> 
> > 
> >    After 2. is solved in the future, we would be able to make existing
> >    driver and management utilities work in Dom0 again.
> 
> Is there any reason why we can't do it now? If existing ndctl (with
> additional patches) can work then we don't need introduce xen-ndctl
> anymore? I think that keeps user interface clearer.

The simple reason is I want to reduce the components (Xen/kernel/QEMU)
touched by the first patchset (whose primary target is to implement
the basic functionality, i.e. mapping host NVDIMM to guest as a
virtual NVDIMM). As you said, leaving a driver (the nvdimm driver
and/or a stub driver) in Dom0 would make the user interface
clearer. Let's see what I can get in the next version.

Thanks,
Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-17 Thread Haozhong Zhang
On 10/17/17 13:45 +0200, Paolo Bonzini wrote:
> On 14/10/2017 00:46, Stefano Stabellini wrote:
> > On Fri, 13 Oct 2017, Jan Beulich wrote:
> > On 13.10.17 at 13:13,  wrote:
> >>> To Jan, Andrew, Stefano and Anthony,
> >>>
> >>> what do you think about allowing QEMU to build the entire guest ACPI
> >>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> >>> still there and just bypassed in this case.
> >> Well, if that can be made work in a non-quirky way and without
> >> loss of functionality, I'd probably be fine. I do think, however,
> >> that there's a reason this is being handled in hvmloader right now.
> > And not to discourage you, just as a clarification, you'll also need to
> > consider backward compatibility: unless the tables are identical, I
> > imagine we'll have to keep using the old tables for already installed
> > virtual machines.
> 
> I agree.  Some of them are already identical, some are not but the QEMU
> version should be okay, and for yet more it's probably better to keep
> the Xen-specific parts in hvmloader.
> 
> The good thing is that it's possible to proceed incrementally once you
> have the hvmloader support for merging the QEMU and hvmloader RSDT or
> XSDT (whatever you are using), starting with just NVDIMM and proceeding
> later with whatever you see fit.
> 

I'll have a try to check how much the differences would affect. If it
would not take too much work, I'd like to adapt Xen NVDIMM enabling
patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
and MST's suggestions.

Thanks,
Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-13 Thread Haozhong Zhang
On 10/13/17 10:44 +0200, Igor Mammedov wrote:
> On Fri, 13 Oct 2017 15:53:26 +0800
> Haozhong Zhang <haozhong.zh...@intel.com> wrote:
> 
> > On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> > > On 12/10/2017 14:45, Haozhong Zhang wrote:  
> > > > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > > > /rom@etc/table-loader. The former is unstructured to guest, and
> > > > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > > > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > > > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > > > of specified area, and fill guest address in specified ACPI field.
> > > > 
> > > > One part of my patches is to implement a mechanism to tell Xen which
> > > > part of ACPI data is a table (NFIT), and which part defines a
> > > > namespace device and what the device name is. I can add two new loader
> > > > commands for them respectively.
> > > > 
> > > > Because they just provide information and SeaBIOS in non-xen
> > > > environment ignores unrecognized commands, they will not break SeaBIOS
> > > > in non-xen environment.
> > > > 
> > > > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > > > dropped, and replaced by adding the new loader commands (though they
> > > > may be used only by Xen).
> > > > 
> > > > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > > > are needed in, perhaps, hvmloader.  
> > > 
> > > If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> > > to process a reduced set of ACPI tables.  In other words,
> > > etc/acpi/tables would only include the NFIT, the SSDT with namespace
> > > devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as 
> > > usual.
> > >
> > > hvmloader can then:
> > > 
> > > 1) allocate some memory for where the XSDT will go
> > > 
> > > 2) process the BIOSLinkerLoader like SeaBIOS would do
> > > 
> > > 3) find the RSDP in low memory, since the loader script must have placed
> > > it there.  If it cannot find it, allocate some low memory, fill it with
> > > the RSDP header and revision, and and jump to step 6
> > > 
> > > 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> > > 
> > > 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> > > 
> > > 6) build hvmloader tables and link them into the RSDT and/or XSDT as 
> > > usual.
> > > 
> > > 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> > > RSDT and/or XSDT, and updated the checksums
> > > 
> > > QEMU's XSDT remains there somewhere in memory, unused but harmless.
> > >   
> +1 to Paolo's suggestion, i.e.
>  1. add BIOSLinkerLoader into hvmloader
>  2. load/process QEMU's tables with #1
>  3. get pointers to QEMU generated NFIT and NVDIMM SSDT from QEMU's RSDT/XSDT
> and put them in hvmloader's RSDT
> 
> > It can work for plan tables which do not contain AML.
> > 
> > However, for a namespace device, Xen needs to know its name in order
> > to detect the potential name conflict with those used in Xen built
> > ACPI. Xen does not (and is not going to) introduce an AML parser, so
> > it cannot get those device names from QEMU built ACPI by its own.
> > 
> > The idea of either this patch series or the new BIOSLinkerLoader
> > command is to let QEMU tell Xen where the definition body of a
> > namespace device (i.e. that part within the outmost "Device(NAME)") is
> > and what the device name is. Xen, after the name conflict check, can
> > re-package the definition body in a namespace device (w/ minimal AML
> > builder code added in Xen) and then in SSDT.
> 
> I'd skip conflict check at runtime as hvmloader doesn't currently
> have "\\_SB\NVDR" device so instead of doing runtime check it might
> do primitive check at build time that ASL sources in hvmloader do
> not contain reserved for QEMU "NVDR" keyword to avoid its addition
> by accident in future. (it also might be reused in future if some
> other tables from QEMU will be reused).
> It's a bit hackinsh but at least it does the job and keeps
> BIOSLinkerLoader interface the same for all supported firmwares
> (I'd consider it as a temporary hack on the way to fully build
> by QEMU ACPI tables for Xen).
> 
> Ideally it wo

Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-13 Thread Haozhong Zhang
On 10/12/17 13:39 -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Oct 12, 2017 at 08:45:44PM +0800, Haozhong Zhang wrote:
> > On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> > > > On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > > > > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > > > > 
> > > > > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > > > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > > > > Haozhong Zhang <haozhong.zh...@intel.com> wrote:
> > > > > > 
> > > > > > > This is the QEMU part patches that works with the associated Xen
> > > > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies 
> > > > > > > on
> > > > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and 
> > > > > > > allocate
> > > > > > > guest address space for vNVDIMM devices.
> > > > > > > 
> > > > > > > All patches can be found at
> > > > > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > > > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > > > > 
> > > > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > > > > label data, as the Xen side support for labels is not implemented 
> > > > > > > yet.
> > > > > > > 
> > > > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a 
> > > > > > > hotplug
> > > > > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > > > > device plugging path work on Xen.
> > > > > > > 
> > > > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when 
> > > > > > > QEMU is
> > > > > > > used as the Xen device model.
> > > > > > 
> > > > > > I've skimmed over patch-set and can't say that I'm happy with
> > > > > > number of xen_enabled() invariants it introduced as well as
> > > > > > with partial blobs it creates.
> > > > > 
> > > > > I have not read the series (Haozhong, please CC me, Anthony and
> > > > > xen-devel to the whole series next time), but yes, indeed. Let's not 
> > > > > add
> > > > > more xen_enabled() if possible.
> > > > > 
> > > > > Haozhong, was there a design document thread on xen-devel about this? 
> > > > > If
> > > > > so, did it reach a conclusion? Was the design accepted? If so, please
> > > > > add a link to the design doc in the introductory email, so that
> > > > > everybody can read it and be on the same page.
> > > > 
> > > > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> > > > the guest ACPI.
> > > > 
> > > > [1] 
> > > > https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html
> > > 
> > > Igor, did you have a chance to read it?
> > > 
> > > .. see below
> > > > 
> > > > > 
> > > > > 
> > > > > > I'd like to reduce above and a way to do this might be making xen 
> > > > > >  1. use fw_cfg
> > > > > >  2. fetch QEMU build acpi tables from fw_cfg
> > > > > >  3. extract nvdim tables (which is trivial) and use them
> > > > > > 
> > > > > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > > > > 
> > > > > > So what's stopping xen from using it elsewhere?,
> > > > > > instead of adding more xen specific code to do 'the same'
> > > > > > job and not reusing/sharing common code with tcg/kvm.
> > > > > 
> > > > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > > > > rely on a firmware-like application called "hvmloader" that runs in
> > > > > guest context and generates the ACPI tables. I have no opinions on
> > > > > hvmloader and I'll let the Xen maintainers talk about it. However, 
> > > > > keep
> > > > > in mind that with an HVM guest some devices are emulated 

Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-13 Thread Haozhong Zhang
On 10/12/17 17:45 +0200, Paolo Bonzini wrote:
> On 12/10/2017 14:45, Haozhong Zhang wrote:
> > Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> > /rom@etc/table-loader. The former is unstructured to guest, and
> > contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> > organized as a set of commands, which direct the guest (e.g., SeaBIOS
> > on KVM/QEMU) to relocate data in the former file, recalculate checksum
> > of specified area, and fill guest address in specified ACPI field.
> > 
> > One part of my patches is to implement a mechanism to tell Xen which
> > part of ACPI data is a table (NFIT), and which part defines a
> > namespace device and what the device name is. I can add two new loader
> > commands for them respectively.
> > 
> > Because they just provide information and SeaBIOS in non-xen
> > environment ignores unrecognized commands, they will not break SeaBIOS
> > in non-xen environment.
> > 
> > On QEMU side, most Xen-specific hacks in ACPI builder could be
> > dropped, and replaced by adding the new loader commands (though they
> > may be used only by Xen).
> > 
> > On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> > are needed in, perhaps, hvmloader.
> 
> If Xen has to parse BIOSLinkerLoader, it can use the existing commands
> to process a reduced set of ACPI tables.  In other words,
> etc/acpi/tables would only include the NFIT, the SSDT with namespace
> devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.
>
> hvmloader can then:
> 
> 1) allocate some memory for where the XSDT will go
> 
> 2) process the BIOSLinkerLoader like SeaBIOS would do
> 
> 3) find the RSDP in low memory, since the loader script must have placed
> it there.  If it cannot find it, allocate some low memory, fill it with
> the RSDP header and revision, and and jump to step 6
> 
> 4) If it found QEMU's RSDP, use it to find QEMU's XSDT
> 
> 5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.
> 
> 6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.
> 
> 7) overwrite the RSDP in low memory with a pointer to hvmloader's own
> RSDT and/or XSDT, and updated the checksums
> 
> QEMU's XSDT remains there somewhere in memory, unused but harmless.
> 

It can work for plan tables which do not contain AML.

However, for a namespace device, Xen needs to know its name in order
to detect the potential name conflict with those used in Xen built
ACPI. Xen does not (and is not going to) introduce an AML parser, so
it cannot get those device names from QEMU built ACPI by its own.

The idea of either this patch series or the new BIOSLinkerLoader
command is to let QEMU tell Xen where the definition body of a
namespace device (i.e. that part within the outmost "Device(NAME)") is
and what the device name is. Xen, after the name conflict check, can
re-package the definition body in a namespace device (w/ minimal AML
builder code added in Xen) and then in SSDT.


Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-12 Thread Haozhong Zhang
On 10/10/17 12:05 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 12, 2017 at 11:15:09AM +0800, Haozhong Zhang wrote:
> > On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> > > CC'ing xen-devel, and the Xen tools and x86 maintainers.
> > > 
> > > On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > > > On Mon, 11 Sep 2017 12:41:47 +0800
> > > > Haozhong Zhang <haozhong.zh...@intel.com> wrote:
> > > > 
> > > > > This is the QEMU part patches that works with the associated Xen
> > > > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > > > guest address space for vNVDIMM devices.
> > > > > 
> > > > > All patches can be found at
> > > > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > > > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > > > 
> > > > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > > > label data, as the Xen side support for labels is not implemented yet.
> > > > > 
> > > > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > > > memory region for Xen guest, in order to make the existing nvdimm
> > > > > device plugging path work on Xen.
> > > > > 
> > > > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > > > used as the Xen device model.
> > > > 
> > > > I've skimmed over patch-set and can't say that I'm happy with
> > > > number of xen_enabled() invariants it introduced as well as
> > > > with partial blobs it creates.
> > > 
> > > I have not read the series (Haozhong, please CC me, Anthony and
> > > xen-devel to the whole series next time), but yes, indeed. Let's not add
> > > more xen_enabled() if possible.
> > > 
> > > Haozhong, was there a design document thread on xen-devel about this? If
> > > so, did it reach a conclusion? Was the design accepted? If so, please
> > > add a link to the design doc in the introductory email, so that
> > > everybody can read it and be on the same page.
> > 
> > Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
> > the guest ACPI.
> > 
> > [1] 
> > https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html
> 
> Igor, did you have a chance to read it?
> 
> .. see below
> > 
> > > 
> > > 
> > > > I'd like to reduce above and a way to do this might be making xen 
> > > >  1. use fw_cfg
> > > >  2. fetch QEMU build acpi tables from fw_cfg
> > > >  3. extract nvdim tables (which is trivial) and use them
> > > > 
> > > > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > > > 
> > > > So what's stopping xen from using it elsewhere?,
> > > > instead of adding more xen specific code to do 'the same'
> > > > job and not reusing/sharing common code with tcg/kvm.
> > > 
> > > So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> > > rely on a firmware-like application called "hvmloader" that runs in
> > > guest context and generates the ACPI tables. I have no opinions on
> > > hvmloader and I'll let the Xen maintainers talk about it. However, keep
> > > in mind that with an HVM guest some devices are emulated by Xen and/or
> > > by other device emulators that can run alongside QEMU. QEMU doesn't have
> > > a full few of the system.
> > > 
> > > Here the question is: does it have to be QEMU the one to generate the
> > > ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> > > like the rest, instead of introducing this split-brain design about
> > > ACPI. We need to see a design doc to fully understand this.
> > >
> > 
> > hvmloader runs in the guest and is responsible to build/load guest
> > ACPI. However, it's not capable to build AML at runtime (for the lack
> > of AML builder). If any guest ACPI object is needed (e.g. by guest
> > DSDT), it has to be generated from ASL by iasl at Xen compile time and
> > then be loaded by hvmloader at runtime.
> > 
> > Xen includes an OperationRegion "BIOS" in the static generated guest
> > DSDT, whose address is hardcoded and which contains a list of values
> > fi

[Xen-devel] [PATCH v2] VT-d: use two 32-bit writes to update DMAR fault address registers

2017-10-10 Thread Haozhong Zhang
The 64-bit DMAR fault address is composed of two 32 bits registers
DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec:
"Software is expected to access 32-bit registers as aligned doublewords",
a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and
DMAR_FEUADDR_REG separately in order to update a 64-bit fault address,
rather than a 64-bit write to DMAR_FEADDR_REG. Note that when x2APIC
is not enabled DMAR_FEUADDR_REG is reserved and it's not necessary to
update it.

Though I haven't seen any errors caused by such one 64-bit write on
real machines, it's still better to follow the specification.

Fixes: ae05fd3912b ("VT-d: use qword MMIO access for MSI address writes")
Reviewed-by: Roger Pau Monné <roger@citrix.com>
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Changes in v2:
 * Explain in commit message and code comment why not updating DMAR_FEUADDR_REG
   when x2APIC is not enabled

This patch actually reverts part of commit ae05fd3912b
("VT-d: use qword MMIO access for MSI address writes"). The latter
was included in XSA-120, 128..131 follow-up patch series [1]. I
don't know whether my patch breaks those XSA fixes. If it does,
please drop my patch.

[1] https://lists.xenproject.org/archives/html/xen-devel/2015-06/msg00638.html
---
 xen/drivers/passthrough/vtd/iommu.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index daaed0abbd..81dd2085c7 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1105,7 +1105,13 @@ static void dma_msi_set_affinity(struct irq_desc *desc, 
const cpumask_t *mask)
 
 spin_lock_irqsave(>register_lock, flags);
 dmar_writel(iommu->reg, DMAR_FEDATA_REG, msg.data);
-dmar_writeq(iommu->reg, DMAR_FEADDR_REG, msg.address);
+dmar_writel(iommu->reg, DMAR_FEADDR_REG, msg.address_lo);
+/*
+ * When x2APIC is not enabled, DMAR_FEUADDR_REG is reserved and
+ * it's not necessary to update it.
+ */
+if (x2apic_enabled)
+dmar_writel(iommu->reg, DMAR_FEUADDR_REG, msg.address_hi);
 spin_unlock_irqrestore(>register_lock, flags);
 }
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers

2017-09-18 Thread Haozhong Zhang
On 09/18/17 02:30 -0600, Jan Beulich wrote:
> >>> On 18.09.17 at 10:18,  wrote:
> >>  From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Monday, September 11, 2017 6:03 PM
> >> 
> >> >>> On 11.09.17 at 08:00,  wrote:
> >> > The 64-bit DMAR fault address is composed of two 32 bits registers
> >> > DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec:
> >> > "Software is expected to access 32-bit registers as aligned doublewords",
> >> > a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and
> >> > DMAR_FEUADDR_REG separately in order to update a 64-bit fault
> >> address,
> >> > rather than a 64-bit write to DMAR_FEADDR_REG.
> >> >
> >> > Though I haven't seen any errors caused by such one 64-bit write on
> >> > real machines, it's still better to follow the specification.
> >> 
> >> Any sane chipset should split qword accesses into dword ones if
> >> they can't be handled at some layer. Also if you undo something
> >> explicitly done by an earlier commit, please quote that commit
> >> and say what was wrong. After all Kevin as the VT-d maintainer
> >> agreed with the change back then.
> > 
> > I'm OK with this change.
> 
> Hmm, would you mind explaining? You were also okay with the
> change in the opposite direction back then, and we've had no
> reports of problems.
> 

I haven't seen any issues of the current 64-bit write on recent Intel
Haswell, Broadwell and Skylake Xeon platforms, so I guess the hardware
can properly handle the 64-bits write to contiguous 32-bit registers.

I actually encountered errors when running Xen on KVM/QEMU with QEMU
vIOMMU enabled, which (QEMU) disallows 64-bit writes to 32-bit
registers and aborts if such writes happen.

If this patch is considered senseless (as it does not fix any errors
on real hardware), I'm fine to fix the above abort on QEMU side (i.e.,
let vIOMMU in QEMU follow the behavior of real hardware).


Haozhong






___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-09-11 Thread Haozhong Zhang
On 09/11/17 11:52 -0700, Stefano Stabellini wrote:
> CC'ing xen-devel, and the Xen tools and x86 maintainers.
> 
> On Mon, 11 Sep 2017, Igor Mammedov wrote:
> > On Mon, 11 Sep 2017 12:41:47 +0800
> > Haozhong Zhang <haozhong.zh...@intel.com> wrote:
> > 
> > > This is the QEMU part patches that works with the associated Xen
> > > patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
> > > QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
> > > guest address space for vNVDIMM devices.
> > > 
> > > All patches can be found at
> > >   Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
> > >   QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3
> > > 
> > > Patch 1 is to avoid dereferencing the NULL pointer to non-existing
> > > label data, as the Xen side support for labels is not implemented yet.
> > > 
> > > Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
> > > memory region for Xen guest, in order to make the existing nvdimm
> > > device plugging path work on Xen.
> > > 
> > > Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
> > > used as the Xen device model.
> > 
> > I've skimmed over patch-set and can't say that I'm happy with
> > number of xen_enabled() invariants it introduced as well as
> > with partial blobs it creates.
> 
> I have not read the series (Haozhong, please CC me, Anthony and
> xen-devel to the whole series next time), but yes, indeed. Let's not add
> more xen_enabled() if possible.
> 
> Haozhong, was there a design document thread on xen-devel about this? If
> so, did it reach a conclusion? Was the design accepted? If so, please
> add a link to the design doc in the introductory email, so that
> everybody can read it and be on the same page.

Yes, there is a design [1] discussed and reviewed. Section 4.3 discussed
the guest ACPI.

[1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html

> 
> 
> > I'd like to reduce above and a way to do this might be making xen 
> >  1. use fw_cfg
> >  2. fetch QEMU build acpi tables from fw_cfg
> >  3. extract nvdim tables (which is trivial) and use them
> > 
> > looking at xen_load_linux(), it seems possible to use fw_cfg.
> > 
> > So what's stopping xen from using it elsewhere?,
> > instead of adding more xen specific code to do 'the same'
> > job and not reusing/sharing common code with tcg/kvm.
> 
> So far, ACPI tables have not been generated by QEMU. Xen HVM machines
> rely on a firmware-like application called "hvmloader" that runs in
> guest context and generates the ACPI tables. I have no opinions on
> hvmloader and I'll let the Xen maintainers talk about it. However, keep
> in mind that with an HVM guest some devices are emulated by Xen and/or
> by other device emulators that can run alongside QEMU. QEMU doesn't have
> a full few of the system.
> 
> Here the question is: does it have to be QEMU the one to generate the
> ACPI blobs for the nvdimm? It would be nicer if it was up to hvmloader
> like the rest, instead of introducing this split-brain design about
> ACPI. We need to see a design doc to fully understand this.
>

hvmloader runs in the guest and is responsible to build/load guest
ACPI. However, it's not capable to build AML at runtime (for the lack
of AML builder). If any guest ACPI object is needed (e.g. by guest
DSDT), it has to be generated from ASL by iasl at Xen compile time and
then be loaded by hvmloader at runtime.

Xen includes an OperationRegion "BIOS" in the static generated guest
DSDT, whose address is hardcoded and which contains a list of values
filled by hvmloader at runtime. Other ACPI objects can refer to those
values (e.g., the number of vCPUs). But it's not enough for generating
guest NVDIMM ACPI objects at compile time and then being customized
and loaded by hvmload, because its structure (i.e., the number of
namespace devices) cannot be decided util the guest config is known.

Alternatively, we may introduce an AML builder in hvmloader and build
all guest ACPI completely in hvmloader. Looking at the similar
implementation in QEMU, it would not be small, compared to the current
size of hvmloader. Besides, I'm still going to let QEMU handle guest
NVDIMM _DSM and _FIT calls, which is another reason I use QEMU to
build NVDIMM ACPI.

> If the design doc thread led into thinking that it has to be QEMU to
> generate them, then would it make the code nicer if we used fw_cfg to
> get the (full or partial) tables from QEMU, as Igor suggested?

I'll have a look at the code (which I didn't notice) pointed by Igor.

One possible issue t

Re: [Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers

2017-09-11 Thread Haozhong Zhang
On 09/11/17 10:38 +0100, Roger Pau Monné wrote:
> On Mon, Sep 11, 2017 at 02:00:48PM +0800, Haozhong Zhang wrote:
> > The 64-bit DMAR fault address is composed of two 32 bits registers
> > DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec:
> > "Software is expected to access 32-bit registers as aligned doublewords",
> > a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and
> > DMAR_FEUADDR_REG separately in order to update a 64-bit fault address,
> > rather than a 64-bit write to DMAR_FEADDR_REG.
> > 
> > Though I haven't seen any errors caused by such one 64-bit write on
> > real machines, it's still better to follow the specification.
> 
> Either the patch description is missing something or the patch is
> wrong. You should mention why is the write to the high part of the
> address now conditional on x2APIC being enabled, when it didn't use to
> be before.
>

When x2APIC is disabled, DMAR_FEUADDR_REG is reserved and it's not
necessary to update it. The original code always writes zero to it in
that case, which is also correct.

Haozhong

> [...]
> > -dmar_writeq(iommu->reg, DMAR_FEADDR_REG, msg.address);
> > +dmar_writel(iommu->reg, DMAR_FEADDR_REG, msg.address_lo);
> > +if (x2apic_enabled)
> > +dmar_writel(iommu->reg, DMAR_FEUADDR_REG, msg.address_hi);
> >  spin_unlock_irqrestore(>register_lock, flags);
> 
> Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 6/6] x86/mce: remove extra blanks in mctelem.c

2017-09-11 Thread Haozhong Zhang
The entire file of mctelem.c is in Linux coding style, so do not
change the coding style and only remove trailing spaces and extra
blank lines.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
 xen/arch/x86/cpu/mcheck/mctelem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c 
b/xen/arch/x86/cpu/mcheck/mctelem.c
index b63e559d4d..492e2af77f 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -220,7 +220,7 @@ void mctelem_process_deferred(unsigned int cpu,
int ret;
 
/*
-* First, unhook the list of telemetry structures, and  
+* First, unhook the list of telemetry structures, and
 * hook it up to the processing list head for this CPU.
 *
 * If @lmce is true and a non-local MC# occurs before the
@@ -339,7 +339,7 @@ void __init mctelem_init(unsigned int datasz)
 {
char *datarr;
unsigned int i;
-   
+
BUILD_BUG_ON(MC_URGENT != 0 || MC_NONURGENT != 1 || MC_NCLASSES != 2);
 
datasz = (datasz & ~0xf) + 0x10;/* 16 byte roundup */
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 5/6] x86/mce: add emacs block to mctelem.c

2017-09-11 Thread Haozhong Zhang
mctelem.c uses the tab indention. Add an emacs block to avoid mixed
indention styles in certain editors.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
 xen/arch/x86/cpu/mcheck/mctelem.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c 
b/xen/arch/x86/cpu/mcheck/mctelem.c
index b144a66053..b63e559d4d 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -550,3 +550,13 @@ void mctelem_ack(mctelem_class_t which, mctelem_cookie_t 
cookie)
wmb();
spin_unlock(_lock);
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: t
+ * tab-width: 8
+ * End:
+ */
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/6] x86/vmce: adapt vmce.c to Xen hypervisor coding style

2017-09-11 Thread Haozhong Zhang
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
 xen/arch/x86/cpu/mcheck/vmce.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 9c460c7c6c..e07cd2feef 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -185,7 +185,7 @@ int vmce_rdmsr(uint32_t msr, uint64_t *val)
 {
 case MSR_IA32_MCG_STATUS:
 *val = cur->arch.vmce.mcg_status;
-if (*val)
+if ( *val )
 mce_printk(MCE_VERBOSE,
"MCE: %pv: rd MCG_STATUS %#"PRIx64"\n", cur, *val);
 break;
@@ -354,7 +354,8 @@ static int vmce_save_vcpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 struct vcpu *v;
 int err = 0;
 
-for_each_vcpu( d, v ) {
+for_each_vcpu ( d, v )
+{
 struct hvm_vmce_vcpu ctxt = {
 .caps = v->arch.vmce.mcg_cap,
 .mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2,
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/6] x86/mce: adapt mce_intel.c to Xen hypervisor coding style

2017-09-11 Thread Haozhong Zhang
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
 xen/arch/x86/cpu/mcheck/mce_intel.c | 262 +++-
 1 file changed, 142 insertions(+), 120 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c 
b/xen/arch/x86/cpu/mcheck/mce_intel.c
index 4c001b407f..e5dd956a24 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -7,7 +7,7 @@
 #include 
 #include 
 #include 
-#include  
+#include 
 #include 
 #include 
 #include 
@@ -64,7 +64,7 @@ static void intel_thermal_interrupt(struct cpu_user_regs 
*regs)
 
 ack_APIC_irq();
 
-if (NOW() < per_cpu(next, cpu))
+if ( NOW() < per_cpu(next, cpu) )
 return;
 
 per_cpu(next, cpu) = NOW() + MILLISECS(5000);
@@ -78,17 +78,16 @@ static void intel_thermal_interrupt(struct cpu_user_regs 
*regs)
 printk(KERN_EMERG "CPU%u: Temperature above threshold\n", cpu);
 printk(KERN_EMERG "CPU%u: Running in modulated clock mode\n", cpu);
 add_taint(TAINT_MACHINE_CHECK);
-} else {
+} else
 printk(KERN_INFO "CPU%u: Temperature/speed normal\n", cpu);
-}
 }
 
 /* Thermal monitoring depends on APIC, ACPI and clock modulation */
 static bool intel_thermal_supported(struct cpuinfo_x86 *c)
 {
-if (!cpu_has_apic)
+if ( !cpu_has_apic )
 return false;
-if (!cpu_has(c, X86_FEATURE_ACPI) || !cpu_has(c, X86_FEATURE_TM1))
+if ( !cpu_has(c, X86_FEATURE_ACPI) || !cpu_has(c, X86_FEATURE_TM1) )
 return false;
 return true;
 }
@@ -102,7 +101,7 @@ static void __init mcheck_intel_therm_init(void)
  * LVT value on BSP and use that value to restore APs' thermal LVT
  * entry BIOS programmed later
  */
-if (intel_thermal_supported(_cpu_data))
+if ( intel_thermal_supported(_cpu_data) )
 lvtthmr_init = apic_read(APIC_LVTTHMR);
 }
 
@@ -115,7 +114,7 @@ static void intel_init_thermal(struct cpuinfo_x86 *c)
 unsigned int cpu = smp_processor_id();
 static uint8_t thermal_apic_vector;
 
-if (!intel_thermal_supported(c))
+if ( !intel_thermal_supported(c) )
 return; /* -ENODEV */
 
 /* first check if its enabled already, in which case there might
@@ -134,23 +133,25 @@ static void intel_init_thermal(struct cpuinfo_x86 *c)
  * BIOS has programmed on AP based on BSP's info we saved (since BIOS
  * is required to set the same value for all threads/cores).
  */
-if ((val & APIC_MODE_MASK) != APIC_DM_FIXED
-|| (val & APIC_VECTOR_MASK) > 0xf)
+if ( (val & APIC_MODE_MASK) != APIC_DM_FIXED
+ || (val & APIC_VECTOR_MASK) > 0xf )
 apic_write(APIC_LVTTHMR, val);
 
-if ((msr_content & (1ULL<<3))
-&& (val & APIC_MODE_MASK) == APIC_DM_SMI) {
-if (c == _cpu_data)
+if ( (msr_content & (1ULL<<3))
+ && (val & APIC_MODE_MASK) == APIC_DM_SMI )
+{
+if ( c == _cpu_data )
 printk(KERN_DEBUG "Thermal monitoring handled by SMI\n");
 return; /* -EBUSY */
 }
 
-if (cpu_has(c, X86_FEATURE_TM2) && (msr_content & (1ULL << 13)))
+if ( cpu_has(c, X86_FEATURE_TM2) && (msr_content & (1ULL << 13)) )
 tm2 = 1;
 
 /* check whether a vector already exists, temporarily masked? */
-if (val & APIC_VECTOR_MASK) {
-if (c == _cpu_data)
+if ( val & APIC_VECTOR_MASK )
+{
+if ( c == _cpu_data )
 printk(KERN_DEBUG "Thermal LVT vector (%#x) already installed\n",
val & APIC_VECTOR_MASK);
 return; /* -EBUSY */
@@ -170,9 +171,9 @@ static void intel_init_thermal(struct cpuinfo_x86 *c)
 wrmsrl(MSR_IA32_MISC_ENABLE, msr_content | (1ULL<<3));
 
 apic_write(APIC_LVTTHMR, val & ~APIC_LVT_MASKED);
-if (opt_cpu_info)
+if ( opt_cpu_info )
 printk(KERN_INFO "CPU%u: Thermal monitoring enabled (%s)\n",
-cpu, tm2 ? "TM2" : "TM1");
+   cpu, tm2 ? "TM2" : "TM1");
 return;
 }
 #endif /* CONFIG_X86_MCE_THERMAL */
@@ -181,7 +182,8 @@ static void intel_init_thermal(struct cpuinfo_x86 *c)
 static inline void intel_get_extended_msr(struct mcinfo_extended *ext, u32 msr)
 {
 if ( ext->mc_msrs < ARRAY_SIZE(ext->mc_msr)
- && msr < MSR_IA32_MCG_EAX + nr_intel_ext_msrs ) {
+ && msr < MSR_IA32_MCG_EAX + nr_intel_ext_msrs )
+{
 ext->mc_msr[ext->mc_msrs].reg = msr;
 rdmsrl(msr, ext->mc_msr[ext->mc_msrs].value);
 ++ext->mc_msrs;
@@ -199,21 +201,21 @@ intel_get_extended_msrs(struct mcinfo_global *mig, struct 
mc_info *mi)
  * According to spec, processor _support_ 64 bit will always
  * have MSR beyond IA32_MCG_MISC
  */
-if (!mi|| !mig || nr_intel_ext_msrs == 0 ||

[Xen-devel] [PATCH 0/6] mce: fix coding style

2017-09-11 Thread Haozhong Zhang
Some files in xen/arch/x86/cpu/mcheck use mixed coding styles. Unify
them to Xen hypervisor coding style. For mctelem.c which is entirely
in one coding style, only remove extra blanks.

No functional change is introduced.

Haozhong Zhang (6):
  x86/mce: adapt mce.{c,h} to Xen hypervisor coding style
  x86/vmce: adapt vmce.c to Xen hypervisor coding style
  x86/mce: adapt mcation.c to Xen hypervisor coding style
  x86/mce: adapt mce_intel.c to Xen hypervisor coding style
  x86/mce: add emacs block to mctelem.c
  x86/mce: remove trailing spaces in mctelem.c

 xen/arch/x86/cpu/mcheck/mcaction.c  |  74 ++---
 xen/arch/x86/cpu/mcheck/mce.c   | 536 
 xen/arch/x86/cpu/mcheck/mce.h   |  21 +-
 xen/arch/x86/cpu/mcheck/mce_intel.c | 262 ++
 xen/arch/x86/cpu/mcheck/mctelem.c   |  14 +-
 xen/arch/x86/cpu/mcheck/vmce.c  |   5 +-
 6 files changed, 509 insertions(+), 403 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/6] x86/mce: adapt mce.{c, h} to Xen hypervisor coding style

2017-09-11 Thread Haozhong Zhang
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
 xen/arch/x86/cpu/mcheck/mce.c | 536 +++---
 xen/arch/x86/cpu/mcheck/mce.h |  21 +-
 2 files changed, 311 insertions(+), 246 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 7affe2591e..580e68d6f2 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -64,7 +64,7 @@ struct mca_banks *mca_allbanks;
 int mce_verbosity;
 static int __init mce_set_verbosity(const char *str)
 {
-if (strcmp("verbose", str) == 0)
+if ( strcmp("verbose", str) == 0 )
 mce_verbosity = MCE_VERBOSE;
 else
 return -EINVAL;
@@ -81,7 +81,6 @@ static void unexpected_machine_check(const struct 
cpu_user_regs *regs)
 fatal_trap(regs, 1);
 }
 
-
 static x86_mce_vector_t _machine_check_vector = unexpected_machine_check;
 
 void x86_mce_vector_register(x86_mce_vector_t hdlr)
@@ -97,11 +96,13 @@ void do_machine_check(const struct cpu_user_regs *regs)
 _machine_check_vector(regs);
 }
 
-/* Init machine check callback handler
+/*
+ * Init machine check callback handler
  * It is used to collect additional information provided by newer
  * CPU families/models without the need to duplicate the whole handler.
  * This avoids having many handlers doing almost nearly the same and each
- * with its own tweaks ands bugs. */
+ * with its own tweaks ands bugs.
+ */
 static x86_mce_callback_t mc_callback_bank_extended = NULL;
 
 void x86_mce_callback_register(x86_mce_callback_t cbfunc)
@@ -109,7 +110,8 @@ void x86_mce_callback_register(x86_mce_callback_t cbfunc)
 mc_callback_bank_extended = cbfunc;
 }
 
-/* Machine check recoverable judgement callback handler
+/*
+ * Machine check recoverable judgement callback handler
  * It is used to judge whether an UC error is recoverable by software
  */
 static mce_recoverable_t mc_recoverable_scan = NULL;
@@ -124,12 +126,12 @@ struct mca_banks *mcabanks_alloc(void)
 struct mca_banks *mb;
 
 mb = xmalloc(struct mca_banks);
-if (!mb)
+if ( !mb )
 return NULL;
 
 mb->bank_map = xzalloc_array(unsigned long,
  BITS_TO_LONGS(nr_mce_banks));
-if (!mb->bank_map)
+if ( !mb->bank_map )
 {
 xfree(mb);
 return NULL;
@@ -142,9 +144,9 @@ struct mca_banks *mcabanks_alloc(void)
 
 void mcabanks_free(struct mca_banks *banks)
 {
-if (banks == NULL)
+if ( banks == NULL )
 return;
-if (banks->bank_map)
+if ( banks->bank_map )
 xfree(banks->bank_map);
 xfree(banks);
 }
@@ -155,15 +157,16 @@ static void mcabank_clear(int banknum)
 
 status = mca_rdmsr(MSR_IA32_MCx_STATUS(banknum));
 
-if (status & MCi_STATUS_ADDRV)
+if ( status & MCi_STATUS_ADDRV )
 mca_wrmsr(MSR_IA32_MCx_ADDR(banknum), 0x0ULL);
-if (status & MCi_STATUS_MISCV)
+if ( status & MCi_STATUS_MISCV )
 mca_wrmsr(MSR_IA32_MCx_MISC(banknum), 0x0ULL);
 
 mca_wrmsr(MSR_IA32_MCx_STATUS(banknum), 0x0ULL);
 }
 
-/* Judging whether to Clear Machine Check error bank callback handler
+/*
+ * Judging whether to Clear Machine Check error bank callback handler
  * According to Intel latest MCA OS Recovery Writer's Guide,
  * whether the error MCA bank needs to be cleared is decided by the mca_source
  * and MCi_status bit value.
@@ -188,17 +191,15 @@ const struct mca_error_handler *__read_mostly 
mce_uhandlers;
 unsigned int __read_mostly mce_dhandler_num;
 unsigned int __read_mostly mce_uhandler_num;
 
-
-static void mca_init_bank(enum mca_source who,
-struct mc_info *mi, int bank)
+static void mca_init_bank(enum mca_source who, struct mc_info *mi, int bank)
 {
 struct mcinfo_bank *mib;
 
-if (!mi)
+if ( !mi )
 return;
 
 mib = x86_mcinfo_reserve(mi, sizeof(*mib), MC_TYPE_BANK);
-if (!mib)
+if ( !mib )
 {
 mi->flags |= MCINFO_FLAGS_UNCOMPLETE;
 return;
@@ -209,26 +210,27 @@ static void mca_init_bank(enum mca_source who,
 mib->mc_bank = bank;
 mib->mc_domid = DOMID_INVALID;
 
-if (mib->mc_status & MCi_STATUS_MISCV)
+if ( mib->mc_status & MCi_STATUS_MISCV )
 mib->mc_misc = mca_rdmsr(MSR_IA32_MCx_MISC(bank));
 
-if (mib->mc_status & MCi_STATUS_ADDRV)
+if ( mib->mc_status & MCi_STATUS_ADDRV )
 mib->mc_addr = mca_rdmsr(MSR_IA32_MCx_ADDR(bank));
 
-if ((mib->mc_status & MCi_STATUS_MISCV) &&
-(mib->mc_status & MCi_STATUS_ADDRV) &&
-(mc_check_addr(mib->mc_status, mib->mc_misc, MC_ADDR_PHYSICAL)) &&
-(who == MCA_POLLER || who == MCA_CMCI_HANDLER) &&
-(mfn_valid(_mfn(paddr_to_pfn(mib->mc_addr)
+if ( (mib->mc_status & MCi_STATUS_MISCV) &&
+ (mib->mc_status & MCi_STATUS_ADDRV) &&

[Xen-devel] [PATCH 3/6] x86/mce: adapt mcation.c to Xen hypervisor coding style

2017-09-11 Thread Haozhong Zhang
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
 xen/arch/x86/cpu/mcheck/mcaction.c | 74 +-
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c 
b/xen/arch/x86/cpu/mcheck/mcaction.c
index f959bed2cb..e42267414e 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -6,15 +6,16 @@
 
 static struct mcinfo_recovery *
 mci_action_add_pageoffline(int bank, struct mc_info *mi,
-   uint64_t mfn, uint32_t status)
+   uint64_t mfn, uint32_t status)
 {
 struct mcinfo_recovery *rec;
 
-if (!mi)
+if ( !mi )
 return NULL;
 
 rec = x86_mcinfo_reserve(mi, sizeof(*rec), MC_TYPE_RECOVERY);
-if (!rec) {
+if ( !rec )
+{
 mi->flags |= MCINFO_FLAGS_UNCOMPLETE;
 return NULL;
 }
@@ -46,14 +47,15 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
 int vmce_vcpuid;
 unsigned int mc_vcpuid;
 
-if (!mc_check_addr(bank->mc_status, bank->mc_misc, MC_ADDR_PHYSICAL)) {
+if ( !mc_check_addr(bank->mc_status, bank->mc_misc, MC_ADDR_PHYSICAL) )
+{
 dprintk(XENLOG_WARNING,
-"No physical address provided for memory error\n");
+"No physical address provided for memory error\n");
 return;
 }
 
 mfn = bank->mc_addr >> PAGE_SHIFT;
-if (offline_page(mfn, 1, ))
+if ( offline_page(mfn, 1, ) )
 {
 dprintk(XENLOG_WARNING,
 "Failed to offline page %lx for MCE error\n", mfn);
@@ -63,21 +65,26 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
 mci_action_add_pageoffline(binfo->bank, binfo->mi, mfn, status);
 
 /* This is free page */
-if (status & PG_OFFLINE_OFFLINED)
+if ( status & PG_OFFLINE_OFFLINED )
 *result = MCER_RECOVERED;
-else if (status & PG_OFFLINE_AGAIN)
+else if ( status & PG_OFFLINE_AGAIN )
 *result = MCER_CONTINUE;
-else if (status & PG_OFFLINE_PENDING) {
+else if ( status & PG_OFFLINE_PENDING )
+{
 /* This page has owner */
-if (status & PG_OFFLINE_OWNED) {
+if ( status & PG_OFFLINE_OWNED )
+{
 bank->mc_domid = status >> PG_OFFLINE_OWNER_SHIFT;
 mce_printk(MCE_QUIET, "MCE: This error page is ownded"
-  " by DOM %d\n", bank->mc_domid);
-/* XXX: Cannot handle shared pages yet
+   " by DOM %d\n", bank->mc_domid);
+/*
+ * XXX: Cannot handle shared pages yet
  * (this should identify all domains and gfn mapping to
- *  the mfn in question) */
+ *  the mfn in question)
+ */
 BUG_ON( bank->mc_domid == DOMID_COW );
-if ( bank->mc_domid != DOMID_XEN ) {
+if ( bank->mc_domid != DOMID_XEN )
+{
 d = get_domain_by_id(bank->mc_domid);
 ASSERT(d);
 gfn = get_gpfn_from_mfn((bank->mc_addr) >> PAGE_SHIFT);
@@ -85,45 +92,46 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
 if ( unmmap_broken_page(d, _mfn(mfn), gfn) )
 {
 printk("Unmap broken memory %lx for DOM%d failed\n",
-mfn, d->domain_id);
+   mfn, d->domain_id);
 goto vmce_failed;
 }
 
 mc_vcpuid = global->mc_vcpuid;
-if (mc_vcpuid == XEN_MC_VCPUID_INVALID ||
-/*
- * Because MC# may happen asynchronously with the actual
- * operation that triggers the error, the domain ID as
- * well as the vCPU ID collected in 'global' at MC# are
- * not always precise. In that case, fallback to broadcast.
- */
-global->mc_domid != bank->mc_domid ||
-(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
- (!(global->mc_gstatus & MCG_STATUS_LMCE) ||
-  !(d->vcpu[mc_vcpuid]->arch.vmce.mcg_ext_ctl &
-MCG_EXT_CTL_LMCE_EN
+if ( mc_vcpuid == XEN_MC_VCPUID_INVALID ||
+ /*
+  * Because MC# may happen asynchronously with the actual
+  * operation that triggers the error, the domain ID as
+  * well as the vCPU ID collected in 'global' at MC# are
+  * not always precise. In that case, fallback to 
broadcast.
+  */
+ global->mc_domid != bank->mc_domid ||
+ (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &am

[Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers

2017-09-11 Thread Haozhong Zhang
The 64-bit DMAR fault address is composed of two 32 bits registers
DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec:
"Software is expected to access 32-bit registers as aligned doublewords",
a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and
DMAR_FEUADDR_REG separately in order to update a 64-bit fault address,
rather than a 64-bit write to DMAR_FEADDR_REG.

Though I haven't seen any errors caused by such one 64-bit write on
real machines, it's still better to follow the specification.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index daaed0abbd..067c092214 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1105,7 +1105,9 @@ static void dma_msi_set_affinity(struct irq_desc *desc, 
const cpumask_t *mask)
 
 spin_lock_irqsave(>register_lock, flags);
 dmar_writel(iommu->reg, DMAR_FEDATA_REG, msg.data);
-dmar_writeq(iommu->reg, DMAR_FEADDR_REG, msg.address);
+dmar_writel(iommu->reg, DMAR_FEADDR_REG, msg.address_lo);
+if (x2apic_enabled)
+dmar_writel(iommu->reg, DMAR_FEUADDR_REG, msg.address_hi);
 spin_unlock_irqrestore(>register_lock, flags);
 }
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'

2017-09-10 Thread Haozhong Zhang
On 09/10/17 22:10 -0700, Dan Williams wrote:
> On Sun, Sep 10, 2017 at 9:37 PM, Haozhong Zhang
> <haozhong.zh...@intel.com> wrote:
> > The kernel NVDIMM driver and the traditional NVDIMM management
> > utilities in Dom0 does not work now. 'xen-ndctl' is added as an
> > alternatively, which manages NVDIMM via Xen hypercalls.
> >
> > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
> > ---
> > Cc: Ian Jackson <ian.jack...@eu.citrix.com>
> > Cc: Wei Liu <wei.l...@citrix.com>
> > ---
> >  .gitignore |   1 +
> >  tools/misc/Makefile|   4 ++
> >  tools/misc/xen-ndctl.c | 172 
> > +
> >  3 files changed, 177 insertions(+)
> >  create mode 100644 tools/misc/xen-ndctl.c
> 
> What about my offer to move this functionality into the upstream ndctl
> utility [1]? I think it is thoroughly confusing that you are reusing
> the name 'ndctl' and avoiding integration with the upstream ndctl
> utility.
> 
> [1]: https://patchwork.kernel.org/patch/9632865/

I'm not object to integrate it with ndctl.

My only concern is that the integration will introduces two types of
user interface. The upstream ndctl works with the kernel driver and
provides easily used *names* (e.g., namespace0.0, region0, nmem0,
etc.) for user input. However, this version patchset hides NFIT from
Dom0 (to simplify the first implementation), so the kernel driver does
not work in Dom0, neither does ndctl. Instead, xen-ndctl has to use
*the physical address* for users to specify their interested NVDIMM
region, which is different from upstream ndctl.


Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC QEMU PATCH v3 07/10] nvdimm acpi: copy NFIT to Xen guest

2017-09-10 Thread Haozhong Zhang
Xen relies on QEMU to build the guest NFIT.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com>
---
 hw/acpi/nvdimm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 9121a766c6..d9cdc5a531 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -404,6 +404,12 @@ static void nvdimm_build_nfit(AcpiNVDIMMState *state, 
GArray *table_offsets,
 build_header(linker, table_data,
  (void *)(table_data->data + header), "NFIT",
  sizeof(NvdimmNfitHeader) + fit_buf->fit->len, 1, NULL, NULL);
+
+if (xen_enabled()) {
+xen_acpi_copy_to_guest("NFIT", table_data->data + header,
+   sizeof(NvdimmNfitHeader) + fit_buf->fit->len,
+   XEN_DM_ACPI_BLOB_TYPE_TABLE);
+}
 }
 
 #define NVDIMM_DSM_MEMORY_SIZE  4096
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC QEMU PATCH v3 01/10] nvdimm: do not intiailize nvdimm->label_data if label size is zero

2017-09-10 Thread Haozhong Zhang
The memory region of vNVDIMM on Xen is a RAM memory region, so
memory_region_get_ram_ptr() cannot be used in nvdimm_realize() to get
a pointer to the label data area in that region. To be worse, it may
abort QEMU. As Xen currently does not support labels (i.e. label size
is 0) and every access in QEMU to labels is led by a label size check,
let's not intiailize nvdimm->label_data if the label size is 0.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
---
 hw/mem/nvdimm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 952fce5ec8..3e58538b99 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -87,7 +87,15 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
 align = memory_region_get_alignment(mr);
 
 pmem_size = size - nvdimm->label_size;
-nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
+/*
+ * The memory region of vNVDIMM on Xen is not a RAM memory region,
+ * so memory_region_get_ram_ptr() below will abort QEMU. In
+ * addition that Xen currently does not support vNVDIMM labels
+ * (i.e. label_size is zero here), let's not initialize of the
+ * pointer to label data if the label size is zero.
+ */
+if (nvdimm->label_size)
+nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
 pmem_size = QEMU_ALIGN_DOWN(pmem_size, align);
 
 if (size <= nvdimm->label_size || !pmem_size) {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC QEMU PATCH v3 10/10] hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled

2017-09-10 Thread Haozhong Zhang
If the machine option 'nvdimm' is enabled and QEMU is used as Xen
device model, construct the guest NFIT and ACPI namespace devices of
vNVDIMM and copy them into guest memory.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: Stefano Stabellini <sstabell...@kernel.org>
Cc: Anthony Perard <anthony.per...@citrix.com>
---
 hw/acpi/aml-build.c   | 10 +++---
 hw/i386/pc.c  | 16 ++--
 hw/i386/xen/xen-hvm.c | 25 +++--
 include/hw/xen/xen.h  |  7 +++
 stubs/xen-hvm.c   |  4 
 5 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 36a6cc450e..5f57c1bef3 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -22,6 +22,7 @@
 #include "qemu/osdep.h"
 #include 
 #include "hw/acpi/aml-build.h"
+#include "hw/xen/xen.h"
 #include "qemu/bswap.h"
 #include "qemu/bitops.h"
 #include "sysemu/numa.h"
@@ -1531,9 +1532,12 @@ build_header(BIOSLinker *linker, GArray *table_data,
 h->oem_revision = cpu_to_le32(1);
 memcpy(h->asl_compiler_id, ACPI_BUILD_APPNAME4, 4);
 h->asl_compiler_revision = cpu_to_le32(1);
-/* Checksum to be filled in by Guest linker */
-bios_linker_loader_add_checksum(linker, ACPI_BUILD_TABLE_FILE,
-tbl_offset, len, checksum_offset);
+/* No linker is used when QEMU is used as Xen device model. */
+if (!xen_enabled()) {
+/* Checksum to be filled in by Guest linker */
+bios_linker_loader_add_checksum(linker, ACPI_BUILD_TABLE_FILE,
+tbl_offset, len, checksum_offset);
+}
 }
 
 void *acpi_data_push(GArray *table_data, unsigned size)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5cbdce61a7..7101d380a0 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1252,12 +1252,16 @@ void pc_machine_done(Notifier *notifier, void *data)
 }
 }
 
-acpi_setup();
-if (pcms->fw_cfg) {
-pc_build_smbios(pcms);
-pc_build_feature_control_file(pcms);
-/* update FW_CFG_NB_CPUS to account for -device added CPUs */
-fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+if (!xen_enabled()) {
+acpi_setup();
+if (pcms->fw_cfg) {
+pc_build_smbios(pcms);
+pc_build_feature_control_file(pcms);
+/* update FW_CFG_NB_CPUS to account for -device added CPUs */
+fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+}
+} else {
+xen_dm_acpi_setup(pcms);
 }
 
 if (pcms->apic_id_limit > 255) {
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index b74c4ffb9c..d81cc7dbbc 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -265,7 +265,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr,
 /* RAM already populated in Xen */
 fprintf(stderr, "%s: do not alloc "RAM_ADDR_FMT
 " bytes of ram at "RAM_ADDR_FMT" when runstate is INMIGRATE\n",
-__func__, size, ram_addr); 
+__func__, size, ram_addr);
 return;
 }
 
@@ -1251,7 +1251,7 @@ static void xen_wakeup_notifier(Notifier *notifier, void 
*data)
 
 static int xen_dm_acpi_needed(PCMachineState *pcms)
 {
-return 0;
+return pcms->acpi_nvdimm_state.is_enabled;
 }
 
 static int dm_acpi_buf_init(XenIOState *state)
@@ -1309,6 +1309,20 @@ static int xen_dm_acpi_init(PCMachineState *pcms, 
XenIOState *state)
 return dm_acpi_buf_init(state);
 }
 
+static void xen_dm_acpi_nvdimm_setup(PCMachineState *pcms)
+{
+GArray *table_offsets = g_array_new(false, true /* clear */,
+sizeof(uint32_t));
+GArray *table_data = g_array_new(false, true /* clear */, 1);
+
+nvdimm_build_acpi(table_offsets, table_data,
+  NULL, >acpi_nvdimm_state,
+  MACHINE(pcms)->ram_slots);
+
+g_array_free(table_offsets, true);
+g_array_free(table_data, true);
+}
+
 static int xs_write_dm_acpi_blob_entry(const char *name,
const char *entry, const char *value)
 {
@@ -1408,6 +1422,13 @@ int xen_acpi_copy_to_guest(const char *name, const void 
*blob, size_t length,
 return 0;
 }
 
+void xen_dm_acpi_setup(PCMachineState *pcms)
+{
+if (pcms->acpi_nvdimm_state.is_enabled) {
+xen_dm_acpi_nvdimm_setup(pcms);
+}
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
 int i, rc;
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 38dcd1a7d4..8c48195e12

[Xen-devel] [RFC QEMU PATCH v3 08/10] nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest

2017-09-10 Thread Haozhong Zhang
Xen relies on QEMU to build the ACPI namespace device of vNVDIMM for
Xen guest.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com>
---
 hw/acpi/nvdimm.c | 55 ++-
 1 file changed, 38 insertions(+), 17 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index d9cdc5a531..bf887512ad 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -1226,22 +1226,8 @@ static void nvdimm_build_nvdimm_devices(Aml *root_dev, 
uint32_t ram_slots)
 }
 }
 
-static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
-  BIOSLinker *linker, GArray *dsm_dma_arrea,
-  uint32_t ram_slots)
+static void nvdimm_build_ssdt_device(Aml *dev, uint32_t ram_slots)
 {
-Aml *ssdt, *sb_scope, *dev;
-int mem_addr_offset, nvdimm_ssdt;
-
-acpi_add_table(table_offsets, table_data);
-
-ssdt = init_aml_allocator();
-acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
-
-sb_scope = aml_scope("\\_SB");
-
-dev = aml_device("NVDR");
-
 /*
  * ACPI 6.0: 9.20 NVDIMM Devices:
  *
@@ -1262,6 +1248,25 @@ static void nvdimm_build_ssdt(GArray *table_offsets, 
GArray *table_data,
 nvdimm_build_fit(dev);
 
 nvdimm_build_nvdimm_devices(dev, ram_slots);
+}
+
+static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
+  BIOSLinker *linker, GArray *dsm_dma_arrea,
+  uint32_t ram_slots)
+{
+Aml *ssdt, *sb_scope, *dev;
+int mem_addr_offset, nvdimm_ssdt;
+
+acpi_add_table(table_offsets, table_data);
+
+ssdt = init_aml_allocator();
+acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
+
+sb_scope = aml_scope("\\_SB");
+
+dev = aml_device("NVDR");
+
+nvdimm_build_ssdt_device(dev, ram_slots);
 
 aml_append(sb_scope, dev);
 aml_append(ssdt, sb_scope);
@@ -1285,6 +1290,18 @@ static void nvdimm_build_ssdt(GArray *table_offsets, 
GArray *table_data,
 free_aml_allocator();
 }
 
+static void nvdimm_build_xen_ssdt(uint32_t ram_slots)
+{
+Aml *dev = init_aml_allocator();
+
+nvdimm_build_ssdt_device(dev, ram_slots);
+build_append_named_dword(dev->buf, NVDIMM_ACPI_MEM_ADDR);
+xen_acpi_copy_to_guest("NVDR", dev->buf->data, dev->buf->len,
+   XEN_DM_ACPI_BLOB_TYPE_NSDEV);
+
+free_aml_allocator();
+}
+
 void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
BIOSLinker *linker, AcpiNVDIMMState *state,
uint32_t ram_slots)
@@ -1296,8 +1313,12 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray 
*table_data,
 return;
 }
 
-nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
-  ram_slots);
+if (!xen_enabled()) {
+nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
+  ram_slots);
+} else {
+nvdimm_build_xen_ssdt(ram_slots);
+}
 
 device_list = nvdimm_get_device_list();
 /* no NVDIMM device is plugged. */
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-09-10 Thread Haozhong Zhang
This is the QEMU part patches that works with the associated Xen
patches to enable vNVDIMM support for Xen HVM domains. Xen relies on
QEMU to build guest NFIT and NVDIMM namespace devices, and allocate
guest address space for vNVDIMM devices.

All patches can be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3

Patch 1 is to avoid dereferencing the NULL pointer to non-existing
label data, as the Xen side support for labels is not implemented yet.

Patch 2 & 3 add a memory backend dedicated for Xen usage and a hotplug
memory region for Xen guest, in order to make the existing nvdimm
device plugging path work on Xen.

Patch 4 - 10 build and cooy NFIT from QEMU to Xen guest, when QEMU is
used as the Xen device model.


Haozhong Zhang (10):
  nvdimm: do not intiailize nvdimm->label_data if label size is zero
  hw/xen-hvm: create the hotplug memory region on Xen
  hostmem-xen: add a host memory backend for Xen
  nvdimm acpi: do not use fw_cfg on Xen
  hw/xen-hvm: initialize DM ACPI
  hw/xen-hvm: add function to copy ACPI into guest memory
  nvdimm acpi: copy NFIT to Xen guest
  nvdimm acpi: copy ACPI namespace device of vNVDIMM to Xen guest
  nvdimm acpi: do not build _FIT method on Xen
  hw/xen-hvm: enable building DM ACPI if vNVDIMM is enabled

 backends/Makefile.objs |   1 +
 backends/hostmem-xen.c | 108 ++
 backends/hostmem.c |   9 +++
 hw/acpi/aml-build.c|  10 ++-
 hw/acpi/nvdimm.c   |  79 ++-
 hw/i386/pc.c   | 102 ++---
 hw/i386/xen/xen-hvm.c  | 204 -
 hw/mem/nvdimm.c|  10 ++-
 hw/mem/pc-dimm.c   |   6 +-
 include/hw/i386/pc.h   |   1 +
 include/hw/xen/xen.h   |  25 ++
 stubs/xen-hvm.c|  10 +++
 12 files changed, 495 insertions(+), 70 deletions(-)
 create mode 100644 backends/hostmem-xen.c

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC QEMU PATCH v3 02/10] hw/xen-hvm: create the hotplug memory region on Xen

2017-09-10 Thread Haozhong Zhang
The guest physical address of vNVDIMM is allocated from the hotplug
memory region, which is not created when QEMU is used as Xen device
model. In order to use vNVDIMM for Xen HVM domains, this commit reuses
the code for pc machine type to create the hotplug memory region for
Xen HVM domains.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
CC: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Stefano Stabellini <sstabell...@kernel.org>
Cc: Anthony Perard <anthony.per...@citrix.com>
---
 hw/i386/pc.c  | 86 ---
 hw/i386/xen/xen-hvm.c |  2 ++
 include/hw/i386/pc.h  |  1 +
 3 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 21081041d5..5cbdce61a7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1347,6 +1347,53 @@ void xen_load_linux(PCMachineState *pcms)
 pcms->fw_cfg = fw_cfg;
 }
 
+void pc_memory_hotplug_init(PCMachineState *pcms, MemoryRegion *system_memory)
+{
+MachineState *machine = MACHINE(pcms);
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size;
+
+if (!pcmc->has_reserved_memory || machine->ram_size >= 
machine->maxram_size)
+return;
+
+if (memory_region_size(>hotplug_memory.mr)) {
+error_report("hotplug memory region has been initialized");
+exit(EXIT_FAILURE);
+}
+
+if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
+error_report("unsupported amount of memory slots: %"PRIu64,
+ machine->ram_slots);
+exit(EXIT_FAILURE);
+}
+
+if (QEMU_ALIGN_UP(machine->maxram_size,
+  TARGET_PAGE_SIZE) != machine->maxram_size) {
+error_report("maximum memory size must by aligned to multiple of "
+ "%d bytes", TARGET_PAGE_SIZE);
+exit(EXIT_FAILURE);
+}
+
+pcms->hotplug_memory.base =
+ROUND_UP(0x1ULL + pcms->above_4g_mem_size, 1ULL << 30);
+
+if (pcmc->enforce_aligned_dimm) {
+/* size hotplug region assuming 1G page max alignment per slot */
+hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
+}
+
+if ((pcms->hotplug_memory.base + hotplug_mem_size) < hotplug_mem_size) {
+error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
+ machine->maxram_size);
+exit(EXIT_FAILURE);
+}
+
+memory_region_init(>hotplug_memory.mr, OBJECT(pcms),
+   "hotplug-memory", hotplug_mem_size);
+memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
+>hotplug_memory.mr);
+}
+
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
@@ -1398,44 +1445,7 @@ void pc_memory_init(PCMachineState *pcms,
 }
 
 /* initialize hotplug memory address space */
-if (pcmc->has_reserved_memory &&
-(machine->ram_size < machine->maxram_size)) {
-ram_addr_t hotplug_mem_size =
-machine->maxram_size - machine->ram_size;
-
-if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
-error_report("unsupported amount of memory slots: %"PRIu64,
- machine->ram_slots);
-exit(EXIT_FAILURE);
-}
-
-if (QEMU_ALIGN_UP(machine->maxram_size,
-  TARGET_PAGE_SIZE) != machine->maxram_size) {
-error_report("maximum memory size must by aligned to multiple of "
- "%d bytes", TARGET_PAGE_SIZE);
-exit(EXIT_FAILURE);
-}
-
-pcms->hotplug_memory.base =
-ROUND_UP(0x1ULL + pcms->above_4g_mem_size, 1ULL << 30);
-
-if (pcmc->enforce_aligned_dimm) {
-/* size hotplug region assuming 1G page max alignment per slot */
-hotplug_mem_size += (1ULL << 30) * machine->ram_slots;
-}
-
-if ((pcms->hotplug_memory.base + hotplug_mem_size) <
-hotplug_mem_size) {
-error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
- machine->maxram_size);
-exit(EXIT_FAILURE);
-}
-
-memory_region_init(>hotplug_memory.mr, OBJECT(pcms),
-   "hotplug-memory", hotplug_mem_size);
-memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
->hotplug_memory.mr);
-}
+pc_mem

[Xen-devel] [RFC QEMU PATCH v3 05/10] hw/xen-hvm: initialize DM ACPI

2017-09-10 Thread Haozhong Zhang
Probe the base address and the length of guest ACPI buffer reserved
for copying ACPI from QEMU.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Stefano Stabellini <sstabell...@kernel.org>
cc: Anthony Perard <anthony.per...@citrix.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 66 +++
 1 file changed, 66 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 90163e1a1b..ae895aaf03 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -18,6 +18,7 @@
 #include "hw/xen/xen_backend.h"
 #include "qmp-commands.h"
 
+#include "qemu/cutils.h"
 #include "qemu/error-report.h"
 #include "qemu/range.h"
 #include "sysemu/xen-mapcache.h"
@@ -86,6 +87,18 @@ typedef struct XenPhysmap {
 QLIST_ENTRY(XenPhysmap) list;
 } XenPhysmap;
 
+#define HVM_XS_DM_ACPI_ROOT"/hvmloader/dm-acpi"
+#define HVM_XS_DM_ACPI_ADDRESS HVM_XS_DM_ACPI_ROOT"/address"
+#define HVM_XS_DM_ACPI_LENGTH  HVM_XS_DM_ACPI_ROOT"/length"
+
+typedef struct XenAcpiBuf {
+ram_addr_t base;
+ram_addr_t length;
+ram_addr_t used;
+} XenAcpiBuf;
+
+static XenAcpiBuf *dm_acpi_buf;
+
 typedef struct XenIOState {
 ioservid_t ioservid;
 shared_iopage_t *shared_page;
@@ -110,6 +123,8 @@ typedef struct XenIOState {
 hwaddr free_phys_offset;
 const XenPhysmap *log_for_dirtybit;
 
+XenAcpiBuf dm_acpi_buf;
+
 Notifier exit;
 Notifier suspend;
 Notifier wakeup;
@@ -1234,6 +1249,52 @@ static void xen_wakeup_notifier(Notifier *notifier, void 
*data)
 xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
+static int xen_dm_acpi_needed(PCMachineState *pcms)
+{
+return 0;
+}
+
+static int dm_acpi_buf_init(XenIOState *state)
+{
+char path[80], *value;
+unsigned int len;
+
+dm_acpi_buf = >dm_acpi_buf;
+
+snprintf(path, sizeof(path),
+ "/local/domain/%d"HVM_XS_DM_ACPI_ADDRESS, xen_domid);
+value = xs_read(state->xenstore, 0, path, );
+if (!value) {
+return -EINVAL;
+}
+if (qemu_strtoul(value, NULL, 16, _acpi_buf->base)) {
+return -EINVAL;
+}
+
+snprintf(path, sizeof(path),
+ "/local/domain/%d"HVM_XS_DM_ACPI_LENGTH, xen_domid);
+value = xs_read(state->xenstore, 0, path, );
+if (!value) {
+return -EINVAL;
+}
+if (qemu_strtoul(value, NULL, 16, _acpi_buf->length)) {
+return -EINVAL;
+}
+
+dm_acpi_buf->used = 0;
+
+return 0;
+}
+
+static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
+{
+if (!xen_dm_acpi_needed(pcms)) {
+return 0;
+}
+
+return dm_acpi_buf_init(state);
+}
+
 void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
 int i, rc;
@@ -1385,6 +1446,11 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 /* Disable ACPI build because Xen handles it */
 pcms->acpi_build_enabled = false;
 
+if (xen_dm_acpi_init(pcms, state)) {
+error_report("failed to initialize xen ACPI");
+goto err;
+}
+
 return;
 
 err:
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC QEMU PATCH v3 03/10] hostmem-xen: add a host memory backend for Xen

2017-09-10 Thread Haozhong Zhang
vNVDIMM requires a host memory backend to allocate its backend
resources to the guest. When QEMU is used as Xen device model, the
backend resource allocation of vNVDIMM is managed out of QEMU. A new
host memory backend 'memory-backend-xen' is introduced to represent
the backend resource allocated by Xen. It simply creates a memory
region of the specified size as a placeholder in the guest address
space, which will be mapped by Xen to the actual backend resource.

Following example QEMU options create a vNVDIMM device backed by a 4GB
host PMEM region at host physical address 0x1:
   -object memory-backend-xen,id=mem1,host-addr=0x1,size=4G
   -device nvdimm,id=nvdimm1,memdev=mem1

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
---
 backends/Makefile.objs |   1 +
 backends/hostmem-xen.c | 108 +
 backends/hostmem.c |   9 +
 hw/mem/pc-dimm.c   |   6 ++-
 4 files changed, 123 insertions(+), 1 deletion(-)
 create mode 100644 backends/hostmem-xen.c

diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 0400799efd..3096fde21f 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_TPM) += tpm.o
 
 common-obj-y += hostmem.o hostmem-ram.o
 common-obj-$(CONFIG_LINUX) += hostmem-file.o
+common-obj-${CONFIG_XEN_BACKEND} += hostmem-xen.o
 
 common-obj-y += cryptodev.o
 common-obj-y += cryptodev-builtin.o
diff --git a/backends/hostmem-xen.c b/backends/hostmem-xen.c
new file mode 100644
index 00..99211efd81
--- /dev/null
+++ b/backends/hostmem-xen.c
@@ -0,0 +1,108 @@
+/*
+ * QEMU Host Memory Backend for Xen
+ *
+ * Copyright(C) 2017 Intel Corporation.
+ *
+ * Author:
+ *   Haozhong Zhang <haozhong.zh...@intel.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+
+#define TYPE_MEMORY_BACKEND_XEN "memory-backend-xen"
+
+#define MEMORY_BACKEND_XEN(obj) \
+OBJECT_CHECK(HostMemoryBackendXen, (obj), TYPE_MEMORY_BACKEND_XEN)
+
+typedef struct HostMemoryBackendXen HostMemoryBackendXen;
+
+struct HostMemoryBackendXen {
+HostMemoryBackend parent_obj;
+
+uint64_t host_addr;
+};
+
+static void xen_backend_get_host_addr(Object *obj, Visitor *v, const char 
*name,
+  void *opaque, Error **errp)
+{
+HostMemoryBackendXen *backend = MEMORY_BACKEND_XEN(obj);
+uint64_t value = backend->host_addr;
+
+visit_type_size(v, name, , errp);
+}
+
+static void xen_backend_set_host_addr(Object *obj, Visitor *v, const char 
*name,
+  void *opaque, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+HostMemoryBackendXen *xb = MEMORY_BACKEND_XEN(obj);
+Error *local_err = NULL;
+uint64_t value;
+
+if (memory_region_size(>mr)) {
+error_setg(_err, "cannot change property value");
+goto out;
+}
+
+visit_type_size(v, name, , _err);
+if (local_err) {
+goto out;
+}
+xb->host_addr = value;
+
+ out:
+error_propagate(errp, local_err);
+}
+
+static void xen_backend_alloc(HostMemoryBackend *backend, Error **errp)
+{
+if (!backend->size) {
+error_setg(errp, "can't create backend with size 0");
+return;
+}
+memory_region_init(>mr, OBJECT(backend), "hostmem-xen",
+   backend->size);
+backend->mr.align = getpagesize();
+}
+
+static void xen_backend_class_init(ObjectClass *oc, void *data)
+{
+HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
+
+bc->alloc = xen_backend_alloc;
+
+object_class_property_add(oc, "host-addr", "int",
+  xen_backend_get_host_addr,
+  xen_backend_set_host_addr,
+  NULL, NULL, _abort);
+}
+
+static const TypeInfo xen_backend_info = {
+.name = TYPE_MEMORY_BACKEND_XEN,
+.parent = TYPE_MEMORY_BACKEND,
+.class_init = xen_ba

[Xen-devel] [RFC QEMU PATCH v3 04/10] nvdimm acpi: do not use fw_cfg on Xen

2017-09-10 Thread Haozhong Zhang
Xen relies on QEMU to build guest ACPI for NVDIMM. However, no fw_cfg
is created when QEMU is used as Xen device model, so QEMU should avoid
using fw_cfg on Xen.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
---
 hw/acpi/nvdimm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 6ceea196e7..9121a766c6 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -32,6 +32,7 @@
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/xen/xen.h"
 
 static int nvdimm_device_list(Object *obj, void *opaque)
 {
@@ -890,8 +891,12 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, 
MemoryRegion *io,
 
 state->dsm_mem = g_array_new(false, true /* clear */, 1);
 acpi_data_push(state->dsm_mem, sizeof(NvdimmDsmIn));
-fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
-state->dsm_mem->len);
+
+/* No fw_cfg is created when QEMU is used as Xen device model. */
+if (!xen_enabled()) {
+fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
+state->dsm_mem->len);
+}
 
 nvdimm_init_fit_buffer(>fit_buf);
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC QEMU PATCH v3 06/10] hw/xen-hvm: add function to copy ACPI into guest memory

2017-09-10 Thread Haozhong Zhang
Xen relies on QEMU to build guest NFIT and NVDIMM namespace devices,
and implements an interface to allow QEMU to copy its ACPI into guest
memory. This commit implements the QEMU side support.

The location of guest memory that can receive QEMU ACPI can be found
from XenStore entries /local/domain/$dom_id/hvmloader/dm-acpi/{address,length},
which have been handled by previous commit.

QEMU ACPI copied to guest is organized in blobs. For each blob, QEMU
creates following XenStore entries under
/local/domain/$dom_id/hvmloader/dm-acpi/$name to indicate its type,
location in above guest memory region and size.
 - type   the type of the passed ACPI, which can be the following
  values.
* XEN_DM_ACPI_BLOB_TYPE_TABLE (0) indicates it's a complete ACPI
  table, and its signature is indicated by $name in the XenStore
  path.
* XEN_DM_ACPI_BLOB_TYPE_NSDEV (1) indicates it's the body of a
  namespace device, and its device name is indicated by $name in
  the XenStore path.
 - offset  offset in byte from the beginning of above guest memory region
 - length  size in byte of the copied ACPI

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Stefano Stabellini <sstabell...@kernel.org>
Cc: Anthony Perard <anthony.per...@citrix.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
---
 hw/i386/xen/xen-hvm.c | 113 ++
 include/hw/xen/xen.h  |  18 
 stubs/xen-hvm.c   |   6 +++
 3 files changed, 137 insertions(+)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index ae895aaf03..b74c4ffb9c 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1286,6 +1286,20 @@ static int dm_acpi_buf_init(XenIOState *state)
 return 0;
 }
 
+static ram_addr_t dm_acpi_buf_alloc(size_t length)
+{
+ram_addr_t addr;
+
+if (dm_acpi_buf->length - dm_acpi_buf->used < length) {
+return 0;
+}
+
+addr = dm_acpi_buf->base + dm_acpi_buf->used;
+dm_acpi_buf->used += length;
+
+return addr;
+}
+
 static int xen_dm_acpi_init(PCMachineState *pcms, XenIOState *state)
 {
 if (!xen_dm_acpi_needed(pcms)) {
@@ -1295,6 +1309,105 @@ static int xen_dm_acpi_init(PCMachineState *pcms, 
XenIOState *state)
 return dm_acpi_buf_init(state);
 }
 
+static int xs_write_dm_acpi_blob_entry(const char *name,
+   const char *entry, const char *value)
+{
+XenIOState *state = container_of(dm_acpi_buf, XenIOState, dm_acpi_buf);
+char path[80];
+
+snprintf(path, sizeof(path),
+ "/local/domain/%d"HVM_XS_DM_ACPI_ROOT"/%s/%s",
+ xen_domid, name, entry);
+if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+return -EIO;
+}
+
+return 0;
+}
+
+static size_t xen_memcpy_to_guest(ram_addr_t gpa,
+  const void *buf, size_t length)
+{
+size_t copied = 0, size;
+ram_addr_t s, e, offset, cur = gpa;
+xen_pfn_t cur_pfn;
+void *page;
+
+if (!buf || !length) {
+return 0;
+}
+
+s = gpa & TARGET_PAGE_MASK;
+e = gpa + length;
+if (e < s) {
+return 0;
+}
+
+while (cur < e) {
+cur_pfn = cur >> TARGET_PAGE_BITS;
+offset = cur - (cur_pfn << TARGET_PAGE_BITS);
+size = (length >= TARGET_PAGE_SIZE - offset) ?
+   TARGET_PAGE_SIZE - offset : length;
+
+page = xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ | 
PROT_WRITE,
+1, _pfn, NULL);
+if (!page) {
+break;
+}
+
+memcpy(page + offset, buf, size);
+xenforeignmemory_unmap(xen_fmem, page, 1);
+
+copied += size;
+buf += size;
+cur += size;
+length -= size;
+}
+
+return copied;
+}
+
+int xen_acpi_copy_to_guest(const char *name, const void *blob, size_t length,
+   int type)
+{
+char value[21];
+ram_addr_t buf_addr;
+int rc;
+
+if (type != XEN_DM_ACPI_BLOB_TYPE_TABLE &&
+type != XEN_DM_ACPI_BLOB_TYPE_NSDEV) {
+return -EINVAL;
+}
+
+buf_addr = dm_acpi_buf_alloc(length);
+if (!buf_addr) {
+return -ENOMEM;
+}
+if (xen_memcpy_to_guest(buf_addr, blob, length) != length) {
+return -EIO;
+}
+
+snprintf(value, sizeof(value), "%d", type);
+rc = xs_write_dm_acpi_blob_entry(name, "type", value);
+if (rc) {
+return rc;
+}
+
+snprintf(value, sizeof(value), "%"PRIu64, buf_addr - dm_acpi_buf->base);
+rc = xs_write_dm_acpi_blob_entry(name, "offset", value);
+if (rc) {
+return rc;
+}
+
+snprintf(value, sizeof(val

[Xen-devel] [RFC QEMU PATCH v3 09/10] nvdimm acpi: do not build _FIT method on Xen

2017-09-10 Thread Haozhong Zhang
Xen currently does not support vNVDIMM hotplug and always sets QEMU
option "maxmem" to be just enough for RAM and vNVDIMM, so it's not
necessary to build _FIT method when QEMU is used as Xen device model.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.e...@gmail.com>
---
 hw/acpi/nvdimm.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index bf887512ad..61789c3966 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -1245,7 +1245,14 @@ static void nvdimm_build_ssdt_device(Aml *dev, uint32_t 
ram_slots)
 
 /* 0 is reserved for root device. */
 nvdimm_build_device_dsm(dev, 0);
-nvdimm_build_fit(dev);
+/*
+ * Xen does not support vNVDIMM hotplug, and always sets the QEMU
+ * option "maxmem" to be just enough for RAM and static plugged
+ * vNVDIMM, so it's unnecessary to build _FIT method on Xen.
+ */
+if (!xen_enabled()) {
+nvdimm_build_fit(dev);
+}
 
 nvdimm_build_nvdimm_devices(dev, ram_slots);
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 34/39] tools/libacpi: add DM ACPI blacklists

2017-09-10 Thread Haozhong Zhang
Some guest ACPI tables and namespace devices are constructed by Xen,
and should not be loaded from device model. This commit adds their
table signatures and device names into two blacklists, which will be
used to check the collisions between guest ACPI constructed by Xen and
guest ACPI passed from device model.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/libacpi/build.c   | 93 +
 tools/libacpi/libacpi.h |  5 +++
 2 files changed, 98 insertions(+)

diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f9881c9604..493ca48025 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -56,6 +56,76 @@ struct acpi_info {
 uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
 };
 
+/* ACPI tables of following signatures should not appear in DM ACPI */
+static uint64_t dm_acpi_signature_blacklist[64];
+/* ACPI namespace devices of following names should not appear in DM ACPI */
+static const char *dm_acpi_devname_blacklist[64];
+
+static int dm_acpi_blacklist_signature(struct acpi_config *config, uint64_t 
sig)
+{
+unsigned int i, nr = ARRAY_SIZE(dm_acpi_signature_blacklist);
+
+if ( !(config->table_flags & ACPI_HAS_DM) )
+return 0;
+
+for ( i = 0; i < nr; i++ )
+{
+uint64_t entry = dm_acpi_signature_blacklist[i];
+
+if ( entry == sig )
+return 0;
+else if ( entry == 0 )
+break;
+}
+
+if ( i >= nr )
+{
+config->table_flags &= ~ACPI_HAS_DM;
+
+printf("ERROR: DM ACPI signature blacklist is full (size %u), "
+   "disable DM ACPI\n", nr);
+
+return -ENOSPC;
+}
+
+dm_acpi_signature_blacklist[i] = sig;
+
+return 0;
+}
+
+static int dm_acpi_blacklist_devname(struct acpi_config *config,
+ const char *devname)
+{
+unsigned int i, nr = ARRAY_SIZE(dm_acpi_devname_blacklist);
+
+if ( !(config->table_flags & ACPI_HAS_DM) )
+return 0;
+
+for ( i = 0; i < nr; i++ )
+{
+const char *entry = dm_acpi_devname_blacklist[i];
+
+if ( !entry )
+break;
+if ( !strncmp(entry, devname, 4) )
+return 0;
+}
+
+if ( i >= nr )
+{
+config->table_flags &= ~ACPI_HAS_DM;
+
+printf("ERROR: DM ACPI devname blacklist is full (size %u), "
+   "disable loading DM ACPI\n", nr);
+
+return -ENOSPC;
+}
+
+dm_acpi_devname_blacklist[i] = devname;
+
+return 0;
+}
+
 static void set_checksum(
 void *table, uint32_t checksum_offset, uint32_t length)
 {
@@ -360,6 +430,7 @@ static int construct_secondary_tables(struct acpi_ctxt 
*ctxt,
 madt = construct_madt(ctxt, config, info);
 if (!madt) return -1;
 table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, madt);
+dm_acpi_blacklist_signature(config, madt->header.signature);
 }
 
 /* HPET. */
@@ -368,6 +439,7 @@ static int construct_secondary_tables(struct acpi_ctxt 
*ctxt,
 hpet = construct_hpet(ctxt, config);
 if (!hpet) return -1;
 table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, hpet);
+dm_acpi_blacklist_signature(config, hpet->header.signature);
 }
 
 /* WAET. */
@@ -377,6 +449,7 @@ static int construct_secondary_tables(struct acpi_ctxt 
*ctxt,
 if ( !waet )
 return -1;
 table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, waet);
+dm_acpi_blacklist_signature(config, waet->header.signature);
 }
 
 if ( config->table_flags & ACPI_HAS_SSDT_PM )
@@ -385,6 +458,9 @@ static int construct_secondary_tables(struct acpi_ctxt 
*ctxt,
 if (!ssdt) return -1;
 memcpy(ssdt, ssdt_pm, sizeof(ssdt_pm));
 table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, ssdt);
+dm_acpi_blacklist_devname(config, "AC");
+dm_acpi_blacklist_devname(config, "BAT0");
+dm_acpi_blacklist_devname(config, "BAT1");
 }
 
 if ( config->table_flags & ACPI_HAS_SSDT_S3 )
@@ -450,6 +526,8 @@ static int construct_secondary_tables(struct acpi_ctxt 
*ctxt,
  offsetof(struct acpi_header, checksum),
  tcpa->header.length);
 }
+dm_acpi_blacklist_signature(config, tcpa->header.signature);
+dm_acpi_blacklist_devname(config, "TPM");
 }
 
 /* SRAT and SLIT */
@@ -459,11 +537,17 @@ static int construct_secondary_tables(struct acpi_ctxt 
*ctxt,
 struct acpi_20_slit *slit = construct_slit(ctxt, config);
 
 if ( srat )
+{
 table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ct

[Xen-devel] [RFC XEN PATCH v3 37/39] tools/libxl: allow aborting domain creation on fatal QMP init errors

2017-09-10 Thread Haozhong Zhang
If some errors happening during QMP initialization can affect the
proper work of a domain, it'd be better to treat them as fatal errors
and abort the creation of that domain. The existing types of QMP
initialization errors are not treated as fatal, and do not abort the
domain creation as before.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/libxl/libxl_create.c | 4 +++-
 tools/libxl/libxl_qmp.c| 9 ++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 9123585b52..3e05ea09e9 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1507,7 +1507,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
 if (dcs->sdss.dm.guest_domid) {
 if (d_config->b_info.device_model_version
 == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
-libxl__qmp_initializations(gc, domid, d_config);
+ret = libxl__qmp_initializations(gc, domid, d_config);
+if (ret == ERROR_BADFAIL)
+goto error_out;
 }
 }
 
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index eab993aca9..e1eb47c1d2 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -1175,11 +1175,12 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t 
domid,
 {
 const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config);
 libxl__qmp_handler *qmp = NULL;
-int ret = 0;
+bool ignore_error = true;
+int ret = -1;
 
 qmp = libxl__qmp_initialize(gc, domid);
 if (!qmp)
-return -1;
+goto out;
 ret = libxl__qmp_query_serial(qmp);
 if (!ret && vnc && vnc->passwd) {
 ret = qmp_change(gc, qmp, "vnc", "password", vnc->passwd);
@@ -1189,7 +1190,9 @@ int libxl__qmp_initializations(libxl__gc *gc, uint32_t 
domid,
 ret = qmp_query_vnc(qmp);
 }
 libxl__qmp_close(qmp);
-return ret;
+
+ out:
+return ret ? (ignore_error ? ERROR_FAIL : ERROR_BADFAIL) : 0;
 }
 
 /*
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 36/39] tools/xl: add xl domain configuration for virtual NVDIMM devices

2017-09-10 Thread Haozhong Zhang
A new xl domain configuration
   vnvdimms = [ 'type=mfn, backend=START_PMEM_MFN, nr_pages=N', ... ]

is added to specify the virtual NVDIMM devices backed by the specified
host PMEM pages. As the kernel PMEM driver does not work in Dom0 now,
we have to specify MFNs.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 docs/man/xl.cfg.pod.5.in|  33 +
 tools/libxl/Makefile|   2 +-
 tools/libxl/libxl.h |   5 ++
 tools/libxl/libxl_types.idl |  15 ++
 tools/libxl/libxl_vnvdimm.c |  49 
 tools/xl/xl_parse.c | 110 +++-
 tools/xl/xl_vmcontrol.c |  15 +-
 7 files changed, 226 insertions(+), 3 deletions(-)
 create mode 100644 tools/libxl/libxl_vnvdimm.c

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 79cb2eaea7..092b051561 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1116,6 +1116,39 @@ FIFO-based event channel ABI support up to 131,071 event 
channels.
 Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
 x86).
 
+=item B

[Xen-devel] [RFC XEN PATCH v3 39/39] tools/libxl: build qemu options from xl vNVDIMM configs

2017-09-10 Thread Haozhong Zhang
For xl configs
  vnvdimms = [ 'type=mfn,backend=$PMEM0_MFN,nr_pages=$N0', ... ]

the following qemu options will be built

  -machine ,nvdimm
  -m ,slots=$NR_SLOTS,maxmem=$MEM_SIZE
  -object memory-backend-xen,id=mem1,host-addr=$PMEM0_ADDR,size=$PMEM0_SIZE
  -device nvdimm,id=xen_nvdimm1,memdev=mem1
  ...

in which,
 - NR_SLOTS is the number of entries in vnvdimms + 1,
 - MEM_SIZE is the total size of all RAM and NVDIMM devices,
 - PMEM0_ADDR = PMEM0_MFN * 4096,
 - PMEM0_SIZE = N0 * 4096,

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/libxl/libxl_dm.c | 81 --
 1 file changed, 79 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index e0e6a99e67..9bdb3cdb29 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -910,6 +910,58 @@ static char *qemu_disk_ide_drive_string(libxl__gc *gc, 
const char *target_path,
 return drive;
 }
 
+#if defined(__linux__)
+
+static uint64_t libxl__build_dm_vnvdimm_args(
+libxl__gc *gc, flexarray_t *dm_args,
+struct libxl_device_vnvdimm *dev, int dev_no)
+{
+uint64_t addr = 0, size = 0;
+char *arg;
+
+switch (dev->backend_type)
+{
+case LIBXL_VNVDIMM_BACKEND_TYPE_MFN:
+addr = dev->u.mfn << XC_PAGE_SHIFT;
+size = dev->nr_pages << XC_PAGE_SHIFT;
+break;
+}
+
+if (!size)
+return 0;
+
+flexarray_append(dm_args, "-object");
+arg = 
GCSPRINTF("memory-backend-xen,id=mem%d,host-addr=%"PRIu64",size=%"PRIu64,
+dev_no + 1, addr, size);
+flexarray_append(dm_args, arg);
+
+flexarray_append(dm_args, "-device");
+arg = GCSPRINTF("nvdimm,id=xen_nvdimm%d,memdev=mem%d",
+dev_no + 1, dev_no + 1);
+flexarray_append(dm_args, arg);
+
+return size;
+}
+
+static uint64_t libxl__build_dm_vnvdimms_args(
+libxl__gc *gc, flexarray_t *dm_args,
+struct libxl_device_vnvdimm *vnvdimms, int num_vnvdimms)
+{
+uint64_t total_size = 0, size;
+unsigned int i;
+
+for (i = 0; i < num_vnvdimms; i++) {
+size = libxl__build_dm_vnvdimm_args(gc, dm_args, [i], i);
+if (!size)
+break;
+total_size += size;
+}
+
+return total_size;
+}
+
+#endif /* __linux__ */
+
 static int libxl__build_device_model_args_new(libxl__gc *gc,
 const char *dm, int guest_domid,
 const libxl_domain_config 
*guest_config,
@@ -923,13 +975,18 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
 const libxl_device_nic *nics = guest_config->nics;
 const int num_disks = guest_config->num_disks;
 const int num_nics = guest_config->num_nics;
+#if defined(__linux__)
+const int num_vnvdimms = guest_config->num_vnvdimms;
+#else
+const int num_vnvdimms = 0;
+#endif
 const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config);
 const libxl_sdl_info *sdl = dm_sdl(guest_config);
 const char *keymap = dm_keymap(guest_config);
 char *machinearg;
 flexarray_t *dm_args, *dm_envs;
 int i, connection, devid, ret;
-uint64_t ram_size;
+uint64_t ram_size, ram_size_in_byte = 0, vnvdimms_size = 0;
 const char *path, *chardev;
 char *user = NULL;
 
@@ -1451,6 +1508,9 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
 }
 }
 
+if (num_vnvdimms)
+machinearg = libxl__sprintf(gc, "%s,nvdimm", machinearg);
+
 flexarray_append(dm_args, machinearg);
 for (i = 0; b_info->extra_hvm && b_info->extra_hvm[i] != NULL; i++)
 flexarray_append(dm_args, b_info->extra_hvm[i]);
@@ -1460,8 +1520,25 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
 }
 
 ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb);
+if (num_vnvdimms) {
+ram_size_in_byte = ram_size << 20;
+vnvdimms_size = libxl__build_dm_vnvdimms_args(gc, dm_args,
+  guest_config->vnvdimms,
+  num_vnvdimms);
+if (ram_size_in_byte + vnvdimms_size < ram_size_in_byte) {
+LOG(ERROR,
+"total size of RAM (%"PRIu64") and NVDIMM (%"PRIu64") 
overflow",
+ram_size_in_byte, vnvdimms_size);
+return ERROR_INVAL;
+}
+}
 flexarray_append(dm_args, "-m");
-flexarray_append(dm_args, GCSPRINTF("%"PRId64, ram_size));
+flexarray_append(dm_args,
+ vnvdimms_size ?
+ GCSPRINTF("%"PRId64",slots=%d,maxmem=%"PRId64,

[Xen-devel] [RFC XEN PATCH v3 32/39] tools/libacpi: add callbacks to access XenStore

2017-09-10 Thread Haozhong Zhang
libacpi needs to access information placed in XenStore in order to
load ACPI built by the device model.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/firmware/hvmloader/util.c   | 52 +++
 tools/firmware/hvmloader/util.h   |  9 +++
 tools/firmware/hvmloader/xenbus.c | 44 +++--
 tools/libacpi/libacpi.h   | 10 
 tools/libxl/libxl_x86_acpi.c  | 24 ++
 5 files changed, 126 insertions(+), 13 deletions(-)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 2f8a4654b0..5b8a4ee9d0 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -893,6 +893,53 @@ static uint32_t acpi_lapic_id(unsigned cpu)
 return LAPIC_ID(cpu);
 }
 
+static const char *acpi_xs_read(struct acpi_ctxt *ctxt, const char *path)
+{
+return xenstore_read(path, NULL);
+}
+
+static int acpi_xs_write(struct acpi_ctxt *ctxt,
+ const char *path, const char *value)
+{
+return xenstore_write(path, value);
+}
+
+static unsigned int count_strings(const char *strings, unsigned int len)
+{
+const char *p;
+unsigned int n;
+
+for ( p = strings, n = 0; p < strings + len; p++ )
+if ( *p == '\0' )
+n++;
+
+return n;
+}
+
+static char **acpi_xs_directory(struct acpi_ctxt *ctxt,
+const char *path, unsigned int *num)
+{
+const char *strings;
+char *s, *p, **ret;
+unsigned int len, n;
+
+strings = xenstore_directory(path, , NULL);
+if ( !strings )
+return NULL;
+
+n = count_strings(strings, len);
+ret = ctxt->mem_ops.alloc(ctxt, n * sizeof(p) + len, 0);
+if ( !ret )
+return NULL;
+memcpy([n], strings, len);
+
+s = (char *)[n];
+for ( p = s, *num = 0; p < s + len; p += strlen(p) + 1 )
+ret[(*num)++] = p;
+
+return ret;
+}
+
 void hvmloader_acpi_build_tables(struct acpi_config *config,
  unsigned int physical)
 {
@@ -998,6 +1045,11 @@ void hvmloader_acpi_build_tables(struct acpi_config 
*config,
 
 ctxt.min_alloc_byte_align = 16;
 
+ctxt.xs_ops.read = acpi_xs_read;
+ctxt.xs_ops.write = acpi_xs_write;
+ctxt.xs_ops.directory = acpi_xs_directory;
+ctxt.xs_opaque = NULL;
+
 acpi_build_tables(, config);
 
 hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr);
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index e9fe6c6e79..37e62d93c0 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -225,6 +225,15 @@ const char *xenstore_read(const char *path, const char 
*default_resp);
  */
 int xenstore_write(const char *path, const char *value);
 
+/* Read a xenstore directory. Return NULL, or a nul-terminated string
+ * which contains all names of directory entries. Names are separated
+ * by '\0'. The returned string is in a static buffer, so only valid
+ * until the next xenstore/xenbus operation.  If @default_resp is
+ * specified, it is returned in preference to a NULL or empty string
+ * received from xenstore.
+ */
+const char *xenstore_directory(const char *path, uint32_t *len,
+   const char *default_resp);
 
 /* Get a HVM param.
  */
diff --git a/tools/firmware/hvmloader/xenbus.c 
b/tools/firmware/hvmloader/xenbus.c
index 2b89a56fce..387c0971e1 100644
--- a/tools/firmware/hvmloader/xenbus.c
+++ b/tools/firmware/hvmloader/xenbus.c
@@ -257,24 +257,16 @@ static int xenbus_recv(uint32_t *reply_len, const char 
**reply_data,
 return 0;
 }
 
-
-/* Read a xenstore key.  Returns a nul-terminated string (even if the XS
- * data wasn't nul-terminated) or NULL.  The returned string is in a
- * static buffer, so only valid until the next xenstore/xenbus operation.
- * If @default_resp is specified, it is returned in preference to a NULL or
- * empty string received from xenstore.
- */
-const char *xenstore_read(const char *path, const char *default_resp)
+static const char *xenstore_read_common(const char *path, uint32_t *len,
+const char *default_resp, bool is_dir)
 {
-uint32_t len = 0, type = 0;
+uint32_t type = 0, expected_type = is_dir ? XS_DIRECTORY : XS_READ;
 const char *answer = NULL;
 
-xenbus_send(XS_READ,
-path, strlen(path),
-"", 1, /* nul separator */
+xenbus_send(expected_type, path, strlen(path), "", 1, /* nul separator */
 NULL, 0);
 
-if ( xenbus_recv(, , ) || (type != XS_READ) )
+if ( xenbus_recv(len, , ) || type != expected_type )
 answer = NULL;
 
 if ( (default_resp != NULL) &&

[Xen-devel] [RFC XEN PATCH v3 33/39] tools/libacpi: add a simple AML builder

2017-09-10 Thread Haozhong Zhang
It is used by libacpi to generate SSDTs from ACPI namespace devices
built by the device model.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/firmware/hvmloader/Makefile |   3 +-
 tools/libacpi/aml_build.c | 326 ++
 tools/libacpi/aml_build.h | 116 ++
 tools/libxl/Makefile  |   3 +-
 4 files changed, 446 insertions(+), 2 deletions(-)
 create mode 100644 tools/libacpi/aml_build.c
 create mode 100644 tools/libacpi/aml_build.h

diff --git a/tools/firmware/hvmloader/Makefile 
b/tools/firmware/hvmloader/Makefile
index 7c4c0ce535..3e917507c8 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -76,11 +76,12 @@ smbios.o: CFLAGS += 
-D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
 
 ACPI_PATH = ../../libacpi
 DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
-ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o
+ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o 
aml_build.o
 $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
 CFLAGS += -I$(ACPI_PATH)
 vpath build.c $(ACPI_PATH)
 vpath static_tables.c $(ACPI_PATH)
+vpath aml_build.c $(ACPI_PATH)
 OBJS += $(ACPI_OBJS)
 
 hvmloader: $(OBJS)
diff --git a/tools/libacpi/aml_build.c b/tools/libacpi/aml_build.c
new file mode 100644
index 00..9b4e28ad95
--- /dev/null
+++ b/tools/libacpi/aml_build.c
@@ -0,0 +1,326 @@
+/*
+ * tools/libacpi/aml_build.c
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include LIBACPI_STDUTILS
+#include "libacpi.h"
+#include "aml_build.h"
+
+#define AML_OP_SCOPE 0x10
+#define AML_OP_EXT   0x5B
+#define AML_OP_DEVICE0x82
+
+#define ACPI_NAMESEG_LEN 4
+
+struct aml_build_alloctor {
+struct acpi_ctxt *ctxt;
+uint8_t *buf;
+uint32_t capacity;
+uint32_t used;
+};
+static struct aml_build_alloctor alloc;
+
+static uint8_t *aml_buf_alloc(uint32_t size)
+{
+uint8_t *buf = NULL;
+struct acpi_ctxt *ctxt = alloc.ctxt;
+uint32_t alloc_size, alloc_align = ctxt->min_alloc_byte_align;
+uint32_t length = alloc.used + size;
+
+/* Overflow ... */
+if ( length < alloc.used )
+return NULL;
+
+if ( length <= alloc.capacity )
+{
+buf = alloc.buf + alloc.used;
+alloc.used += size;
+}
+else
+{
+alloc_size = length - alloc.capacity;
+alloc_size = (alloc_size + alloc_align) & ~(alloc_align - 1);
+buf = ctxt->mem_ops.alloc(ctxt, alloc_size, alloc_align);
+
+if ( buf &&
+ buf == alloc.buf + alloc.capacity /* cont to existing buf */ )
+{
+alloc.capacity += alloc_size;
+buf = alloc.buf + alloc.used;
+alloc.used += size;
+}
+else
+buf = NULL;
+}
+
+return buf;
+}
+
+static uint32_t get_package_length(uint8_t *pkg)
+{
+uint32_t len;
+
+len = pkg - alloc.buf;
+len = alloc.used - len;
+
+return len;
+}
+
+/*
+ * On success, an object in the following form is stored at @buf.
+ *   @byte
+ *   the original content in @buf
+ */
+static int build_prepend_byte(uint8_t *buf, uint8_t byte)
+{
+uint32_t len;
+
+len = buf - alloc.buf;
+len = alloc.used - len;
+
+if ( !aml_buf_alloc(sizeof(uint8_t)) )
+return -1;
+
+if ( len )
+memmove(buf + 1, buf, len);
+buf[0] = byte;
+
+return 0;
+}
+
+/*
+ * On success, an object in the following form is stored at @buf.
+ *   AML encoding of four-character @name
+ *   the original content in @buf
+ *
+ * Refer to  ACPI spec 6.1, Sec 20.2.2 "Name Objects Encoding".
+ *
+ * XXX: names of multiple segments (e.g. X.Y.Z) are not supported
+ */
+static int build_prepend_name(uint8_t *buf, const char *name)
+{
+uint8_t *p = buf;
+const char *s = name;
+uint32_t len, name_len;
+
+while ( *s == '\\' || *s == '^' )
+{
+if ( build_prepend_byte(p, (uint8_t) *s) )
+return -1;
+++p;
+++s;
+}
+
+if ( !*s )
+return buil

[Xen-devel] [RFC XEN PATCH v3 20/39] tools/xen-ndctl: add option '--mgmt' to command 'list'

2017-09-10 Thread Haozhong Zhang
If the option '--mgmt' is present, the command 'list' will list all
PMEM regions for management usage.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/misc/xen-ndctl.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index 1289a83dbe..058f8ccaf5 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -57,9 +57,10 @@ static const struct xen_ndctl_cmd
 
 {
 .name= "list",
-.syntax  = "[--all | --raw ]",
+.syntax  = "[--all | --raw | --mgmt]",
 .help= "--all: the default option, list all PMEM regions of 
following types.\n"
-   "--raw: list all PMEM regions detected by Xen 
hypervisor.\n",
+   "--raw: list all PMEM regions detected by Xen hypervisor.\n"
+   "--mgmt: list all PMEM regions for management usage.\n",
 .handler = handle_list,
 .need_xc = true,
 },
@@ -162,12 +163,46 @@ static int handle_list_raw(void)
 return rc;
 }
 
+static int handle_list_mgmt(void)
+{
+int rc;
+unsigned int nr = 0, i;
+xen_sysctl_nvdimm_pmem_mgmt_region_t *mgmt_list;
+
+rc = xc_nvdimm_pmem_get_regions_nr(xch, PMEM_REGION_TYPE_MGMT, );
+if ( rc )
+{
+fprintf(stderr, "Cannot get the number of PMEM regions: %s.\n",
+strerror(-rc));
+return rc;
+}
+
+mgmt_list = malloc(nr * sizeof(*mgmt_list));
+if ( !mgmt_list )
+return -ENOMEM;
+
+rc = xc_nvdimm_pmem_get_regions(xch, PMEM_REGION_TYPE_MGMT, mgmt_list, 
);
+if ( rc )
+goto out;
+
+printf("Management PMEM regions:\n");
+for ( i = 0; i < nr; i++ )
+printf(" %u: MFN 0x%lx - 0x%lx, used 0x%lx\n",
+   i, mgmt_list[i].smfn, mgmt_list[i].emfn, 
mgmt_list[i].used_mfns);
+
+ out:
+free(mgmt_list);
+
+return rc;
+}
+
 static const struct list_handlers {
 const char *option;
 int (*handler)(void);
 } list_hndrs[] =
 {
 { "--raw", handle_list_raw },
+{ "--mgmt", handle_list_mgmt },
 };
 
 static const unsigned int nr_list_hndrs =
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 24/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions

2017-09-10 Thread Haozhong Zhang
Allow XEN_SYSCTL_nvdimm_pmem_get_regions to return a list of data PMEM
regions.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 tools/libxc/xc_misc.c   |  8 
 xen/common/pmem.c   | 46 +
 xen/include/public/sysctl.h | 12 
 3 files changed, 66 insertions(+)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index db74df853a..93a1f8fdc5 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -944,6 +944,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t 
type,
 size = sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t) * max;
 break;
 
+case PMEM_REGION_TYPE_DATA:
+size = sizeof(xen_sysctl_nvdimm_pmem_data_region_t) * max;
+break;
+
 default:
 return -EINVAL;
 }
@@ -969,6 +973,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t 
type,
 set_xen_guest_handle(regions->u_buffer.mgmt_regions, buffer);
 break;
 
+case PMEM_REGION_TYPE_DATA:
+set_xen_guest_handle(regions->u_buffer.data_regions, buffer);
+break;
+
 default:
 rc = -EINVAL;
 goto out;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index cbe557c220..ed4a014c30 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -251,6 +251,48 @@ static int pmem_get_mgmt_regions(
 return rc;
 }
 
+static int pmem_get_data_regions(
+XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_data_region_t) regions,
+unsigned int *num_regions)
+{
+struct list_head *cur;
+unsigned int nr = 0, max = *num_regions;
+xen_sysctl_nvdimm_pmem_data_region_t region;
+int rc = 0;
+
+if ( !guest_handle_okay(regions, max * sizeof(region)) )
+return -EINVAL;
+
+spin_lock(_data_lock);
+
+list_for_each(cur, _data_regions)
+{
+struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+if ( nr >= max )
+break;
+
+region.smfn = pmem->smfn;
+region.emfn = pmem->emfn;
+region.mgmt_smfn = pmem->u.data.mgmt_smfn;
+region.mgmt_emfn = pmem->u.data.mgmt_emfn;
+
+if ( copy_to_guest_offset(regions, nr, , 1) )
+{
+rc = -EFAULT;
+break;
+}
+
+nr++;
+}
+
+spin_unlock(_data_lock);
+
+*num_regions = nr;
+
+return rc;
+}
+
 static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 {
 unsigned int type = regions->type, max = regions->num_regions;
@@ -269,6 +311,10 @@ static int 
pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 rc = pmem_get_mgmt_regions(regions->u_buffer.mgmt_regions, );
 break;
 
+case PMEM_REGION_TYPE_DATA:
+rc = pmem_get_data_regions(regions->u_buffer.data_regions, );
+break;
+
 default:
 rc = -EINVAL;
 }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index d7c12f23fb..8595ea438a 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1141,6 +1141,16 @@ struct xen_sysctl_nvdimm_pmem_mgmt_region {
 typedef struct xen_sysctl_nvdimm_pmem_mgmt_region 
xen_sysctl_nvdimm_pmem_mgmt_region_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_mgmt_region_t);
 
+/* PMEM_REGION_TYPE_DATA */
+struct xen_sysctl_nvdimm_pmem_data_region {
+uint64_t smfn;
+uint64_t emfn;
+uint64_t mgmt_smfn;
+uint64_t mgmt_emfn;
+};
+typedef struct xen_sysctl_nvdimm_pmem_data_region 
xen_sysctl_nvdimm_pmem_data_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_data_region_t);
+
 /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
 struct xen_sysctl_nvdimm_pmem_regions_nr {
 uint8_t type; /* IN: one of PMEM_REGION_TYPE_* */
@@ -1161,6 +1171,8 @@ struct xen_sysctl_nvdimm_pmem_regions {
 XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions;
 /* if type == PMEM_REGION_TYPE_MGMT */
 XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) mgmt_regions;
+/* if type == PMEM_REGION_TYPE_DATA */
+XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_data_region_t) data_regions;
 } u_buffer;   /* IN: the guest handler where the entries of PMEM
  regions of the type @type are returned */
 };
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 31/39] tools/libacpi: add callback to translate GPA to GVA

2017-09-10 Thread Haozhong Zhang
The location of ACPI blobs passed from device modeil is offered in
guest physical address. libacpi needs to convert the guest physical
address to guest virtual address before it can access those ACPI
blobs.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/firmware/hvmloader/util.c |  6 ++
 tools/firmware/hvmloader/util.h |  1 +
 tools/libacpi/libacpi.h |  1 +
 tools/libxl/libxl_x86_acpi.c| 10 ++
 4 files changed, 18 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index c2218d9fcb..2f8a4654b0 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -871,6 +871,11 @@ static unsigned long acpi_v2p(struct acpi_ctxt *ctxt, void 
*v)
 return virt_to_phys(v);
 }
 
+static void *acpi_p2v(struct acpi_ctxt *ctxt, unsigned long p)
+{
+return phys_to_virt(p);
+}
+
 static void *acpi_mem_alloc(struct acpi_ctxt *ctxt,
 uint32_t size, uint32_t align)
 {
@@ -989,6 +994,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
 ctxt.mem_ops.alloc = acpi_mem_alloc;
 ctxt.mem_ops.free = acpi_mem_free;
 ctxt.mem_ops.v2p = acpi_v2p;
+ctxt.mem_ops.p2v = acpi_p2v;
 
 ctxt.min_alloc_byte_align = 16;
 
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 2ef854eb8f..e9fe6c6e79 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -200,6 +200,7 @@ xen_pfn_t mem_hole_alloc(uint32_t nr_mfns);
 /* Allocate memory in a reserved region below 4GB. */
 void *mem_alloc(uint32_t size, uint32_t align);
 #define virt_to_phys(v) ((unsigned long)(v))
+#define phys_to_virt(v) ((void *)(p))
 
 /* Allocate memory in a scratch region */
 void *scratch_alloc(uint32_t size, uint32_t align);
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 157f63f7bc..f5a1c384bc 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -51,6 +51,7 @@ struct acpi_ctxt {
 void *(*alloc)(struct acpi_ctxt *ctxt, uint32_t size, uint32_t align);
 void (*free)(struct acpi_ctxt *ctxt, void *v, uint32_t size);
 unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v);
+void *(*p2v)(struct acpi_ctxt *ctxt, unsigned long p);
 } mem_ops;
 
 uint32_t min_alloc_byte_align; /* minimum alignment used by mem_ops.alloc 
*/
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index 3b79b2179b..b14136949c 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -52,6 +52,15 @@ static unsigned long virt_to_phys(struct acpi_ctxt *ctxt, 
void *v)
 libxl_ctxt->alloc_base_paddr);
 }
 
+static void *phys_to_virt(struct acpi_ctxt *ctxt, unsigned long p)
+{
+struct libxl_acpi_ctxt *libxl_ctxt =
+CONTAINER_OF(ctxt, struct libxl_acpi_ctxt, c);
+
+return (void *)((p - libxl_ctxt->alloc_base_paddr) +
+libxl_ctxt->alloc_base_vaddr);
+}
+
 static void *mem_alloc(struct acpi_ctxt *ctxt,
uint32_t size, uint32_t align)
 {
@@ -181,6 +190,7 @@ int libxl__dom_load_acpi(libxl__gc *gc,
 
 libxl_ctxt.c.mem_ops.alloc = mem_alloc;
 libxl_ctxt.c.mem_ops.v2p = virt_to_phys;
+libxl_ctxt.c.mem_ops.p2v = phys_to_virt;
 libxl_ctxt.c.mem_ops.free = acpi_mem_free;
 
 libxl_ctxt.c.min_alloc_byte_align = 16;
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 35/39] tools/libacpi: load ACPI built by the device model

2017-09-10 Thread Haozhong Zhang
ACPI tables built by the device model, whose signatures do not
conflict with tables built by Xen except SSDT, are loaded after ACPI
tables built by Xen.

ACPI namespace devices built by the device model, whose names do not
conflict with devices built by Xen, are assembled and placed in SSDTs
after ACPI tables built by Xen.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/firmware/hvmloader/util.c |  15 +++
 tools/libacpi/acpi2_0.h |   2 +
 tools/libacpi/build.c   | 237 
 tools/libacpi/libacpi.h |   5 +
 4 files changed, 259 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 5b8a4ee9d0..0468fea490 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -1019,6 +1019,21 @@ void hvmloader_acpi_build_tables(struct acpi_config 
*config,
 if ( !strncmp(xenstore_read("platform/acpi_laptop_slate", "0"), "1", 1)  )
 config->table_flags |= ACPI_HAS_SSDT_LAPTOP_SLATE;
 
+s = xenstore_read(HVM_XS_DM_ACPI_ADDRESS, NULL);
+if ( s )
+{
+config->dm.addr = strtoll(s, NULL, 0);
+
+s = xenstore_read(HVM_XS_DM_ACPI_LENGTH, NULL);
+if ( s )
+{
+config->dm.length = strtoll(s, NULL, 0);
+config->table_flags |= ACPI_HAS_DM;
+}
+else
+config->dm.addr = 0;
+}
+
 config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC |
 ACPI_HAS_WAET | ACPI_HAS_PMTIMER |
 ACPI_HAS_BUTTONS | ACPI_HAS_VGA |
diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 2619ba32db..365825e6bc 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -435,6 +435,7 @@ struct acpi_20_slit {
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
 #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
 #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
+#define ACPI_2_0_SSDT_SIGNATURE ASCII32('S','S','D','T')
 
 /*
  * Table revision numbers.
@@ -449,6 +450,7 @@ struct acpi_20_slit {
 #define ACPI_1_0_FADT_REVISION 0x01
 #define ACPI_2_0_SRAT_REVISION 0x01
 #define ACPI_2_0_SLIT_REVISION 0x01
+#define ACPI_2_0_SSDT_REVISION 0x02
 
 #pragma pack ()
 
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index 493ca48025..8ec1dfda5f 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -15,6 +15,7 @@
 
 #include LIBACPI_STDUTILS
 #include "acpi2_0.h"
+#include "aml_build.h"
 #include "libacpi.h"
 #include "ssdt_s3.h"
 #include "ssdt_s4.h"
@@ -56,6 +57,9 @@ struct acpi_info {
 uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
 };
 
+#define DM_ACPI_BLOB_TYPE_TABLE 0 /* ACPI table */
+#define DM_ACPI_BLOB_TYPE_NSDEV 1 /* AML of an ACPI namespace device */
+
 /* ACPI tables of following signatures should not appear in DM ACPI */
 static uint64_t dm_acpi_signature_blacklist[64];
 /* ACPI namespace devices of following names should not appear in DM ACPI */
@@ -141,6 +145,233 @@ static void set_checksum(
 p[checksum_offset] = -sum;
 }
 
+static bool has_dm_tables(struct acpi_ctxt *ctxt,
+  const struct acpi_config *config)
+{
+char **dir;
+unsigned int num;
+
+if ( !(config->table_flags & ACPI_HAS_DM) || !config->dm.addr )
+return false;
+
+dir = ctxt->xs_ops.directory(ctxt, HVM_XS_DM_ACPI_ROOT, );
+if ( !dir || !num )
+return false;
+
+return true;
+}
+
+/* Return true if no collision is found. */
+static bool check_signature_collision(uint64_t sig)
+{
+unsigned int i;
+for ( i = 0; i < ARRAY_SIZE(dm_acpi_signature_blacklist); i++ )
+{
+if ( sig == dm_acpi_signature_blacklist[i] )
+return false;
+}
+return true;
+}
+
+/* Return true if no collision is found. */
+static int check_devname_collision(const char *name)
+{
+unsigned int i;
+for ( i = 0; i < ARRAY_SIZE(dm_acpi_devname_blacklist); i++ )
+{
+if ( !strncmp(name, dm_acpi_devname_blacklist[i], 4) )
+return false;
+}
+return true;
+}
+
+static const char *xs_read_dm_acpi_blob_key(struct acpi_ctxt *ctxt,
+const char *name, const char *key)
+{
+/*
+ * @name is supposed to be 4 characters at most, and the longest @key
+ * so far is 'address' (7), so 30 characters is enough to hold the
+ * longest path HVM_XS_DM_ACPI_ROOT/name/key.
+ */
+#define DM_ACPI_BLOB_PATH_MAX_LENGTH   30
+char path[DM_ACPI_BLOB_PATH_MAX_LENGTH];
+snprintf(path, DM_ACPI_BLOB_PATH_MAX_LENGTH, HVM_XS_DM_

[Xen-devel] [RFC XEN PATCH v3 18/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions_nr

2017-09-10 Thread Haozhong Zhang
Allow XEN_SYSCTL_nvdimm_pmem_get_regions_nr to return the number of
management PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 tools/libxc/xc_misc.c | 4 +++-
 xen/common/pmem.c | 4 
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index bebe6d04c8..4b5558aaa5 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -894,7 +894,9 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, 
uint8_t type, uint32_t *nr)
 xen_sysctl_nvdimm_op_t *nvdimm = 
 int rc;
 
-if ( !nr || type != PMEM_REGION_TYPE_RAW )
+if ( !nr ||
+ (type != PMEM_REGION_TYPE_RAW &&
+  type != PMEM_REGION_TYPE_MGMT) )
 return -EINVAL;
 
 sysctl.cmd = XEN_SYSCTL_nvdimm_op;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 7a081c2879..54b3e7119a 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -142,6 +142,10 @@ static int 
pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
 regions_nr->num_regions = nr_raw_regions;
 break;
 
+case PMEM_REGION_TYPE_MGMT:
+regions_nr->num_regions = nr_mgmt_regions;
+break;
+
 default:
 rc = -EINVAL;
 }
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 29/39] tools: reserve guest memory for ACPI from device model

2017-09-10 Thread Haozhong Zhang
Some virtual devices (e.g. NVDIMM) require complex ACPI tables and
definition blocks (in AML), which a device model (e.g. QEMU) has
already been able to construct. Instead of introducing the redundant
implementation to Xen, we would like to reuse the device model to
construct those ACPI stuffs.

This commit allows Xen to reserve an area in the guest memory for the
device model to pass its ACPI tables and definition blocks to guest,
which will be loaded by hvmloader. The base guest physical address and
the size of the reserved area are passed to the device model via
XenStore keys hvmloader/dm-acpi/{address, length}. An xl config
"dm_acpi_pages = N" is added to specify the number of reserved guest
memory pages.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/libxc/include/xc_dom.h|  1 +
 tools/libxc/xc_dom_x86.c| 13 +
 tools/libxl/libxl_dom.c | 25 +
 tools/libxl/libxl_types.idl |  1 +
 tools/xl/xl_parse.c | 17 -
 xen/include/public/hvm/hvm_xs_strings.h |  8 
 6 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index ce47058c41..7c541576e7 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -93,6 +93,7 @@ struct xc_dom_image {
 struct xc_dom_seg pgtables_seg;
 struct xc_dom_seg devicetree_seg;
 struct xc_dom_seg start_info_seg; /* HVMlite only */
+struct xc_dom_seg dm_acpi_seg;/* reserved PFNs for DM ACPI */
 xen_pfn_t start_info_pfn;
 xen_pfn_t console_pfn;
 xen_pfn_t xenstore_pfn;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..8755350295 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -674,6 +674,19 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
  ioreq_server_pfn(0));
 xc_hvm_param_set(xch, domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
  NR_IOREQ_SERVER_PAGES);
+
+if ( dom->dm_acpi_seg.pages )
+{
+size_t acpi_size = dom->dm_acpi_seg.pages * XC_DOM_PAGE_SIZE(dom);
+
+rc = xc_dom_alloc_segment(dom, >dm_acpi_seg, "DM ACPI",
+  0, acpi_size);
+if ( rc != 0 )
+{
+DOMPRINTF("Unable to reserve memory for DM ACPI");
+goto out;
+}
+}
 }
 
 rc = xc_dom_alloc_segment(dom, >start_info_seg,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index f54fd49a73..bad1719892 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -897,6 +897,29 @@ static int hvm_build_set_xs_values(libxl__gc *gc,
 goto err;
 }
 
+if (dom->dm_acpi_seg.pages) {
+uint64_t guest_addr_out = dom->dm_acpi_seg.pfn * XC_DOM_PAGE_SIZE(dom);
+
+if (guest_addr_out >= 0x1ULL) {
+LOG(ERROR,
+"Guest address of DM ACPI is 0x%"PRIx64", but expected below 
4G",
+guest_addr_out);
+goto err;
+}
+
+path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_ADDRESS, domid);
+ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64, 
guest_addr_out);
+if (ret)
+goto err;
+
+path = GCSPRINTF("/local/domain/%d/"HVM_XS_DM_ACPI_LENGTH, domid);
+ret = libxl__xs_printf(gc, XBT_NULL, path, "0x%"PRIx64,
+   (uint64_t)(dom->dm_acpi_seg.pages *
+  XC_DOM_PAGE_SIZE(dom)));
+if (ret)
+goto err;
+}
+
 return 0;
 
 err:
@@ -1184,6 +1207,8 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 dom->vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
 }
 
+dom->dm_acpi_seg.pages = info->u.hvm.dm_acpi_pages;
+
 rc = libxl__build_dom(gc, domid, info, state, dom);
 if (rc != 0)
 goto out;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 173d70acec..4acc0457f4 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -565,6 +565,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
("rdm", libxl_rdm_reserve),
("rdm_mem_boundary_memkb", MemKB),
("mca_caps", uint64),
+   ("dm_acpi_pages",integer),
])),
  ("pv", Struct(None, [("kernel", string),

[Xen-devel] [RFC XEN PATCH v3 38/39] tools/libxl: initiate PMEM mapping via QMP callback

2017-09-10 Thread Haozhong Zhang
The base guest physical address of each vNVDIMM device is decided by
QEMU. Add a QMP callback to get the base address from QEMU and query Xen
hypervisor to map host PMEM pages to that address.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/libxl/libxl_qmp.c | 130 
 tools/libxl/libxl_vnvdimm.c |  30 ++
 tools/libxl/libxl_vnvdimm.h |  30 ++
 3 files changed, 190 insertions(+)
 create mode 100644 tools/libxl/libxl_vnvdimm.h

diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index e1eb47c1d2..299f9c8260 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -26,6 +26,7 @@
 
 #include "_libxl_list.h"
 #include "libxl_internal.h"
+#include "libxl_vnvdimm.h"
 
 /* #define DEBUG_RECEIVED */
 
@@ -1170,6 +1171,127 @@ int libxl_qemu_monitor_command(libxl_ctx *ctx, uint32_t 
domid,
 return rc;
 }
 
+#if defined(__linux__)
+
+static int qmp_register_vnvdimm_callback(libxl__qmp_handler *qmp,
+ const libxl__json_object *o,
+ void *arg)
+{
+GC_INIT(qmp->ctx);
+const libxl_domain_config *guest_config = arg;
+const libxl_device_vnvdimm *vnvdimm;
+const libxl__json_object *obj, *sub_map, *sub_obj;
+const char *id, *expected_id;
+unsigned int i, slot;
+unsigned long gpa, size, mfn, gpfn, nr_pages;
+int rc = 0;
+
+for (i = 0; (obj = libxl__json_array_get(o, i)); i++) {
+if (!libxl__json_object_is_map(obj))
+continue;
+
+sub_map = libxl__json_map_get("data", obj, JSON_MAP);
+if (!sub_map)
+continue;
+
+sub_obj = libxl__json_map_get("slot", sub_map, JSON_INTEGER);
+slot = libxl__json_object_get_integer(sub_obj);
+if (slot > guest_config->num_vnvdimms) {
+LOG(ERROR,
+"Invalid QEMU memory device slot %u, expecting less than %u",
+slot, guest_config->num_vnvdimms);
+rc = -ERROR_INVAL;
+goto out;
+}
+vnvdimm = _config->vnvdimms[slot];
+
+/*
+ * Double check whether it's a NVDIMM memory device, through
+ * all memory devices in QEMU on Xen are for vNVDIMM.
+ */
+expected_id = libxl__sprintf(gc, "xen_nvdimm%u", slot + 1);
+if (!expected_id) {
+LOG(ERROR, "Cannot build device id");
+rc = -ERROR_FAIL;
+goto out;
+}
+sub_obj = libxl__json_map_get("id", sub_map, JSON_STRING);
+id = libxl__json_object_get_string(sub_obj);
+if (!id || strncmp(id, expected_id, strlen(expected_id))) {
+LOG(ERROR,
+"Invalid QEMU memory device id %s, expecting %s",
+id, expected_id);
+rc = -ERROR_FAIL;
+goto out;
+}
+
+sub_obj = libxl__json_map_get("addr", sub_map, JSON_INTEGER);
+gpa = libxl__json_object_get_integer(sub_obj);
+sub_obj = libxl__json_map_get("size", sub_map, JSON_INTEGER);
+size = libxl__json_object_get_integer(sub_obj);
+if ((gpa | size) & ~XC_PAGE_MASK) {
+LOG(ERROR,
+"Invalid address 0x%lx or size 0x%lx of QEMU memory device %s, 
"
+"not aligned to 0x%lx",
+gpa, size, id, XC_PAGE_SIZE);
+rc = -ERROR_INVAL;
+goto out;
+}
+gpfn = gpa >> XC_PAGE_SHIFT;
+
+nr_pages = size >> XC_PAGE_SHIFT;
+if (nr_pages > vnvdimm->nr_pages) {
+LOG(ERROR,
+"Invalid size 0x%lx of QEMU memory device %s, "
+"expecting no larger than 0x%lx",
+size, id, vnvdimm->nr_pages << XC_PAGE_SHIFT);
+rc = -ERROR_INVAL;
+goto out;
+}
+
+switch (vnvdimm->backend_type) {
+case LIBXL_VNVDIMM_BACKEND_TYPE_MFN:
+mfn = vnvdimm->u.mfn;
+break;
+
+default:
+LOG(ERROR, "Invalid NVDIMM backend type %u", 
vnvdimm->backend_type);
+rc = -ERROR_INVAL;
+goto out;
+}
+
+rc = libxl_vnvdimm_add_pages(gc, qmp->domid, mfn, gpfn, nr_pages);
+if (rc) {
+LOG(ERROR,
+"Cannot map PMEM pages for QEMU memory device %s, "
+"mfn 0x%lx, gpfn 0x%lx, nr 0x%lx, rc %d",
+id, mfn, gpfn, nr_pages, rc);
+rc = -ERROR_FAIL;
+goto out;
+}
+}
+
+ out:
+GC_FREE;
+return rc;
+}
+
+static int libxl__qmp_query_vnvdimms(libxl_

[Xen-devel] [RFC XEN PATCH v3 27/39] xen/pmem: release PMEM pages on HVM domain destruction

2017-09-10 Thread Haozhong Zhang
A new step RELMEM_pmem is added and taken before RELMEM_xen to release
all PMEM pages mapped to a HVM domain.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: George Dunlap <george.dun...@eu.citrix.com>
---
 xen/arch/x86/domain.c| 32 
 xen/arch/x86/mm.c|  9 +++--
 xen/common/pmem.c| 10 ++
 xen/include/asm-x86/domain.h |  1 +
 xen/include/xen/pmem.h   |  6 ++
 5 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dbddc536d3..1c4e788780 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1755,11 +1755,15 @@ static int relinquish_memory(
 {
 struct page_info  *page;
 unsigned long x, y;
+bool  is_pmem_list = (list == >pmem_page_list);
 int   ret = 0;
 
 /* Use a recursive lock, as we may enter 'free_domheap_page'. */
 spin_lock_recursive(>page_alloc_lock);
 
+if ( is_pmem_list )
+spin_lock(>pmem_lock);
+
 while ( (page = page_list_remove_head(list)) )
 {
 /* Grab a reference to the page so it won't disappear from under us. */
@@ -1841,8 +1845,9 @@ static int relinquish_memory(
 }
 }
 
-/* Put the page on the list and /then/ potentially free it. */
-page_list_add_tail(page, >arch.relmem_list);
+if ( !is_pmem_list )
+/* Put the page on the list and /then/ potentially free it. */
+page_list_add_tail(page, >arch.relmem_list);
 put_page(page);
 
 if ( hypercall_preempt_check() )
@@ -1852,10 +1857,13 @@ static int relinquish_memory(
 }
 }
 
-/* list is empty at this point. */
-page_list_move(list, >arch.relmem_list);
+if ( !is_pmem_list )
+/* list is empty at this point. */
+page_list_move(list, >arch.relmem_list);
 
  out:
+if ( is_pmem_list )
+spin_unlock(>pmem_lock);
 spin_unlock_recursive(>page_alloc_lock);
 return ret;
 }
@@ -1922,13 +1930,29 @@ int domain_relinquish_resources(struct domain *d)
 return ret;
 }
 
+#ifndef CONFIG_NVDIMM_PMEM
 d->arch.relmem = RELMEM_xen;
+#else
+d->arch.relmem = RELMEM_pmem;
+#endif
 
 spin_lock(>page_alloc_lock);
 page_list_splice(>arch.relmem_list, >page_list);
 INIT_PAGE_LIST_HEAD(>arch.relmem_list);
 spin_unlock(>page_alloc_lock);
 
+#ifdef CONFIG_NVDIMM_PMEM
+/* Fallthrough. Relinquish every page of PMEM. */
+case RELMEM_pmem:
+if ( is_hvm_domain(d) )
+{
+ret = relinquish_memory(d, >pmem_page_list, ~0UL);
+if ( ret )
+return ret;
+}
+d->arch.relmem = RELMEM_xen;
+#endif
+
 /* Fallthrough. Relinquish every page of memory. */
 case RELMEM_xen:
 ret = relinquish_memory(d, >xenpage_list, ~0UL);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 93ccf198c9..26f9e5a13e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -106,6 +106,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2341,8 +2342,12 @@ void put_page(struct page_info *page)
 
 if ( unlikely((nx & PGC_count_mask) == 0) )
 {
-if ( !is_pmem_page(page) /* PMEM page is not allocated from Xen heap. 
*/
- && cleanup_page_cacheattr(page) == 0 )
+#ifdef CONFIG_NVDIMM_PMEM
+if ( is_pmem_page(page) )
+pmem_page_cleanup(page);
+else
+#endif
+if ( cleanup_page_cacheattr(page) == 0 )
 free_domheap_page(page);
 else
 gdprintk(XENLOG_WARNING,
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 2f9ad64a26..8b9378dce6 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -741,6 +741,16 @@ int pmem_populate(struct xen_pmem_map_args *args)
 return rc;
 }
 
+void pmem_page_cleanup(struct page_info *page)
+{
+ASSERT(is_pmem_page(page));
+ASSERT((page->count_info & PGC_count_mask) == 0);
+
+page->count_info = PGC_pmem_page | PGC_state_free;
+page_set_owner(page, NULL);
+set_gpfn_from_mfn(page_to_mfn(page), INVALID_M2P_ENTRY);
+}
+
 int __init pmem_dom0_setup_permission(struct domain *d)
 {
 struct list_head *cur;
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index fb8bf17458..8322546b5d 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -303,6 +303,7 @@ struct arch_domain
 enum {
 RELMEM_not_started,
 RELMEM_shared,
+RELMEM_pmem,
 RELMEM_xen,
 RELMEM_l4,
 RELMEM_l3,
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index 2dab90530b..dfbc412065 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/in

[Xen-devel] [RFC XEN PATCH v3 02/39] x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()

2017-09-10 Thread Haozhong Zhang
Replace pdx_to_page(pfn_to_pdx(pfn)) by mfn_to_page(pfn), which is
identical to the former.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 6c5221f90c..c93383d7d9 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -720,12 +720,11 @@ static void cleanup_frame_table(struct mem_hotadd_info 
*info)
 spfn = info->spfn;
 epfn = info->epfn;
 
-sva = (unsigned long)pdx_to_page(pfn_to_pdx(spfn));
-eva = (unsigned long)pdx_to_page(pfn_to_pdx(epfn));
+sva = (unsigned long)mfn_to_page(spfn);
+eva = (unsigned long)mfn_to_page(epfn);
 
 /* Intialize all page */
-memset(mfn_to_page(spfn), -1,
-   (unsigned long)mfn_to_page(epfn) - (unsigned 
long)mfn_to_page(spfn));
+memset((void *)sva, -1, eva - sva);
 
 while (sva < eva)
 {
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0

2017-09-10 Thread Haozhong Zhang
... to avoid the inference with the PMEM driver and management
utilities in Dom0.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Gang Wei <gang@intel.com>
Cc: Shane Wang <shane.w...@intel.com>
---
 xen/arch/x86/acpi/power.c |  7 +++
 xen/arch/x86/dom0_build.c |  5 +
 xen/arch/x86/shutdown.c   |  3 +++
 xen/arch/x86/tboot.c  |  4 
 xen/common/kexec.c|  3 +++
 xen/common/pmem.c | 21 +
 xen/drivers/acpi/nfit.c   | 21 +
 xen/include/xen/acpi.h|  2 ++
 xen/include/xen/pmem.h| 13 +
 9 files changed, 79 insertions(+)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 1e4e5680a7..d135715a49 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -178,6 +178,10 @@ static int enter_state(u32 state)
 
 freeze_domains();
 
+#ifdef CONFIG_NVDIMM_PMEM
+acpi_nfit_reinstate();
+#endif
+
 acpi_dmar_reinstate();
 
 if ( (error = disable_nonboot_cpus()) )
@@ -260,6 +264,9 @@ static int enter_state(u32 state)
 mtrr_aps_sync_end();
 adjust_vtd_irq_affinities();
 acpi_dmar_zap();
+#ifdef CONFIG_NVDIMM_PMEM
+acpi_nfit_zap();
+#endif
 thaw_domains();
 system_state = SYS_STATE_active;
 spin_unlock(_lock);
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index f616b99ddc..10741e865a 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -452,6 +453,10 @@ int __init dom0_setup_permissions(struct domain *d)
 rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+rc |= pmem_dom0_setup_permission(d);
+#endif
+
 return rc;
 }
 
diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
index a87aa60add..1902dfe73e 100644
--- a/xen/arch/x86/shutdown.c
+++ b/xen/arch/x86/shutdown.c
@@ -550,6 +550,9 @@ void machine_restart(unsigned int delay_millisecs)
 
 if ( tboot_in_measured_env() )
 {
+#ifdef CONFIG_NVDIMM_PMEM
+acpi_nfit_reinstate();
+#endif
 acpi_dmar_reinstate();
 tboot_shutdown(TB_SHUTDOWN_REBOOT);
 }
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index 59d7c477f4..24e3b81ff1 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -488,6 +488,10 @@ int __init tboot_parse_dmar_table(acpi_table_handler 
dmar_handler)
 /* but dom0 will read real table, so must zap it there too */
 acpi_dmar_zap();
 
+#ifdef CONFIG_NVDIMM_PMEM
+acpi_nfit_zap();
+#endif
+
 return rc;
 }
 
diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index fcc68bd4d8..c8c6138e71 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -366,6 +366,9 @@ static int kexec_common_shutdown(void)
 watchdog_disable();
 console_start_sync();
 spin_debug_disable();
+#ifdef CONFIG_NVDIMM_PMEM
+acpi_nfit_reinstate();
+#endif
 acpi_dmar_reinstate();
 
 return 0;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 49648222a6..c9f5f6e904 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -18,6 +18,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 /*
@@ -128,3 +130,22 @@ int pmem_register(unsigned long smfn, unsigned long emfn, 
unsigned int pxm)
 
 return rc;
 }
+
+#ifdef CONFIG_X86
+
+int __init pmem_dom0_setup_permission(struct domain *d)
+{
+struct list_head *cur;
+struct pmem *pmem;
+int rc = 0;
+
+list_for_each(cur, _raw_regions)
+{
+pmem = list_entry(cur, struct pmem, link);
+rc |= iomem_deny_access(d, pmem->smfn, pmem->emfn - 1);
+}
+
+return rc;
+}
+
+#endif /* CONFIG_X86 */
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
index 68750c2edc..5f34cf2464 100644
--- a/xen/drivers/acpi/nfit.c
+++ b/xen/drivers/acpi/nfit.c
@@ -179,6 +179,24 @@ static void __init acpi_nfit_register_pmem(struct 
acpi_nfit_desc *desc)
 }
 }
 
+void acpi_nfit_zap(void)
+{
+uint32_t sig = 0x4e494654; /* "TFIN" */
+
+if ( nfit_desc.acpi_table )
+write_atomic((uint32_t *)_desc.acpi_table->header.signature[0],
+ sig);
+}
+
+void acpi_nfit_reinstate(void)
+{
+uint32_t sig = 0x5449464e; /* "NFIT" */
+
+if ( nfit_desc.acpi_table )
+write_atomic((uint32_t *)_desc.acpi_table->header.signature[0],
+ sig);
+}
+
 void __init acpi_nfit_boot_init(void)
 {
 acpi_status status;
@@ -193,6 +211,9 @@ void __init acpi_nfit_boot_init(void)
 map_pages_to_xen((unsigned long)nfit_desc.acpi_table, PFN_DOWN(nfit_addr),
  PFN_UP(nfit_addr + nfit_len) - PFN_DOWN(nfit_addr),
  PAGE_HYPERVISOR);
+
+/* Hide NFIT from Dom0. */
+acpi_nfit_zap();
 }
 
 void __init acpi_nfit_init(void)
diff

[Xen-devel] [RFC XEN PATCH v3 30/39] tools/libacpi: expose the minimum alignment used by mem_ops.alloc

2017-09-10 Thread Haozhong Zhang
The AML builder added later needs to allocate contiguous memory across
multiple calls to mem_ops.alloc(). Therefore, it needs to know the
minimal alignment used by mem_ops.alloc().

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/firmware/hvmloader/util.c | 2 ++
 tools/libacpi/libacpi.h | 2 ++
 tools/libxl/libxl_x86_acpi.c| 2 ++
 3 files changed, 6 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 0c3f2d24cd..c2218d9fcb 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -990,6 +990,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
 ctxt.mem_ops.free = acpi_mem_free;
 ctxt.mem_ops.v2p = acpi_v2p;
 
+ctxt.min_alloc_byte_align = 16;
+
 acpi_build_tables(, config);
 
 hvm_param_set(HVM_PARAM_VM_GENERATION_ID_ADDR, config->vm_gid_addr);
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index a2efd23b0b..157f63f7bc 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -52,6 +52,8 @@ struct acpi_ctxt {
 void (*free)(struct acpi_ctxt *ctxt, void *v, uint32_t size);
 unsigned long (*v2p)(struct acpi_ctxt *ctxt, void *v);
 } mem_ops;
+
+uint32_t min_alloc_byte_align; /* minimum alignment used by mem_ops.alloc 
*/
 };
 
 struct acpi_config {
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index 176175676f..3b79b2179b 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -183,6 +183,8 @@ int libxl__dom_load_acpi(libxl__gc *gc,
 libxl_ctxt.c.mem_ops.v2p = virt_to_phys;
 libxl_ctxt.c.mem_ops.free = acpi_mem_free;
 
+libxl_ctxt.c.min_alloc_byte_align = 16;
+
 rc = init_acpi_config(gc, dom, b_info, );
 if (rc) {
 LOG(ERROR, "init_acpi_config failed (rc=%d)", rc);
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 15/39] x86_64/mm: allow customized location of extended frametable and M2P table

2017-09-10 Thread Haozhong Zhang
As the existing data in PMEM region is persistent, Xen hypervisor has
no knowledge of which part is free to be used for the frame table and
M2P table of that PMEM region. Instead, we will allow users or system
admins to specify the location of those frame table and M2P table.
The location is not necessarily at the beginning of the PMEM region,
which is different from the case of hotplugged RAM.

This commit adds the support for a customized page allocation
function, which is used to allocate the memory for the frame table and
M2P table. No page free function is added, and we require that all
allocated pages can be reclaimed or has no effect out of
memory_add_common(), if memory_add_common() fails.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 83 
 1 file changed, 69 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index c8ffafe8a8..d92307ca0b 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -106,13 +106,44 @@ struct mem_hotadd_info
 unsigned long cur;
 };
 
+struct mem_hotadd_alloc
+{
+/*
+ * Allocate 2^PAGETABLE_ORDER pages.
+ *
+ * No free function is added right now, so we require that all
+ * allocated pages can be reclaimed easily or has no effect out of
+ * memory_add_common(), if memory_add_common() fails.
+ *
+ * For example, alloc_hotadd_mfn(), which is used in RAM hotplug,
+ * allocates pages from the hotplugged RAM. If memory_add_common()
+ * fails, the hotplugged RAM will not be available to Xen, so
+ * pages allocated by alloc_hotadd_mfns() will never be used and
+ * have no effect.
+ *
+ * Parameters:
+ *  opaque:   arguments of the allocator (depending on the implementation)
+ *
+ * Return:
+ *  On success, return MFN of the first page.
+ *  Otherwise, return mfn_x(INVALID_MFN).
+ */
+unsigned long (*alloc_mfns)(void *opaque);
+
+/*
+ * Additional arguments passed to @alloc_mfns().
+ */
+void *opaque;
+};
+
 static int hotadd_mem_valid(unsigned long pfn, struct mem_hotadd_info *info)
 {
 return (pfn < info->epfn && pfn >= info->spfn);
 }
 
-static unsigned long alloc_hotadd_mfn(struct mem_hotadd_info *info)
+static unsigned long alloc_hotadd_mfn(void *opaque)
 {
+struct mem_hotadd_info *info = opaque;
 unsigned mfn;
 
 ASSERT((info->cur + ( 1UL << PAGETABLE_ORDER) < info->epfn) &&
@@ -315,7 +346,8 @@ static void destroy_m2p_mapping(struct mem_hotadd_info 
*info)
  * spfn/epfn: the pfn ranges to be setup
  * free_s/free_e: the pfn ranges that is free still
  */
-static int setup_compat_m2p_table(struct mem_hotadd_info *info)
+static int setup_compat_m2p_table(struct mem_hotadd_info *info,
+  struct mem_hotadd_alloc *alloc)
 {
 unsigned long i, va, smap, emap, rwva, epfn = info->epfn, mfn;
 unsigned int n;
@@ -369,7 +401,13 @@ static int setup_compat_m2p_table(struct mem_hotadd_info 
*info)
 if ( n == CNT )
 continue;
 
-mfn = alloc_hotadd_mfn(info);
+mfn = alloc->alloc_mfns(alloc->opaque);
+if ( mfn == mfn_x(INVALID_MFN) )
+{
+err = -ENOMEM;
+break;
+}
+
 err = map_pages_to_xen(rwva, mfn, 1UL << PAGETABLE_ORDER,
PAGE_HYPERVISOR);
 if ( err )
@@ -389,7 +427,8 @@ static int setup_compat_m2p_table(struct mem_hotadd_info 
*info)
  * Allocate and map the machine-to-phys table.
  * The L3 for RO/RWRW MPT and the L2 for compatible MPT should be setup already
  */
-static int setup_m2p_table(struct mem_hotadd_info *info)
+static int setup_m2p_table(struct mem_hotadd_info *info,
+   struct mem_hotadd_alloc *alloc)
 {
 unsigned long i, va, smap, emap;
 unsigned int n;
@@ -438,7 +477,13 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 break;
 if ( n < CNT )
 {
-unsigned long mfn = alloc_hotadd_mfn(info);
+unsigned long mfn = alloc->alloc_mfns(alloc->opaque);
+
+if ( mfn == mfn_x(INVALID_MFN) )
+{
+ret = -ENOMEM;
+goto error;
+}
 
 ret = map_pages_to_xen(
 RDWR_MPT_VIRT_START + i * sizeof(unsigned long),
@@ -483,7 +528,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 #undef CNT
 #undef MFN
 
-ret = setup_compat_m2p_table(info);
+ret = setup_compat_m2p_table(info, alloc);
 error:
 return ret;
 }
@@ -762,7 +807,7 @@ static void cleanup_frame_table(unsigned long spfn, 
unsigned long epfn)
 }
 
 static int setup_frametable_chunk(vo

[Xen-devel] [RFC XEN PATCH v3 05/39] x86/mm: exclude PMEM regions from initial frametable

2017-09-10 Thread Haozhong Zhang
No specification defines that PMEM regions cannot appear in margins
between RAM regions. If that does happen, init_frametable() will need
to allocate RAM for the part of frametable of PMEM regions. However,
PMEM regions can be very large (several terabytes or more), so
init_frametable() may fail.

Because Xen does not use PMEM at the boot time, we can defer the
actual resource allocation of frametable of PMEM regions. At the boot
time, all pages of frametable of PMEM regions appearing between RAM
regions are mapped one RAM page filled with 0xff.

Any attempt, whichs write to those frametable pages before the their
actual resource is allocated, implies bugs in Xen. Therefore, the
read-only mapping is used here to make those bugs explicit.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: George Dunlap <george.dun...@eu.citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 xen/arch/x86/mm.c | 117 +-
 xen/arch/x86/setup.c  |   4 ++
 xen/drivers/acpi/Makefile |   2 +
 xen/drivers/acpi/nfit.c   | 116 +
 xen/include/acpi/actbl1.h |  43 +
 xen/include/xen/acpi.h|   7 +++
 6 files changed, 278 insertions(+), 11 deletions(-)
 create mode 100644 xen/drivers/acpi/nfit.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e5a029c9be..2fdf609805 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -83,6 +83,9 @@
  * an application-supplied buffer).
  */
 
+#ifdef CONFIG_NVDIMM_PMEM
+#include 
+#endif
 #include 
 #include 
 #include 
@@ -196,31 +199,123 @@ static int __init parse_mmio_relax(const char *s)
 }
 custom_param("mmio-relax", parse_mmio_relax);
 
-static void __init init_frametable_chunk(void *start, void *end)
+static void __init init_frametable_ram_chunk(unsigned long s, unsigned long e)
 {
-unsigned long s = (unsigned long)start;
-unsigned long e = (unsigned long)end;
-unsigned long step, mfn;
+unsigned long cur, step, mfn;
 
-ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1)));
-for ( ; s < e; s += step << PAGE_SHIFT )
+for ( cur = s; cur < e; cur += step << PAGE_SHIFT )
 {
 step = 1UL << (cpu_has_page1gb &&
-   !(s & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ?
+   !(cur & ((1UL << L3_PAGETABLE_SHIFT) - 1)) ?
L3_PAGETABLE_SHIFT - PAGE_SHIFT :
L2_PAGETABLE_SHIFT - PAGE_SHIFT);
 /*
  * The hardcoded 4 below is arbitrary - just pick whatever you think
  * is reasonable to waste as a trade-off for using a large page.
  */
-while ( step && s + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) )
+while ( step && cur + (step << PAGE_SHIFT) > e + (4 << PAGE_SHIFT) )
 step >>= PAGETABLE_ORDER;
 mfn = alloc_boot_pages(step, step);
-map_pages_to_xen(s, mfn, step, PAGE_HYPERVISOR);
+map_pages_to_xen(cur, mfn, step, PAGE_HYPERVISOR);
 }
 
-memset(start, 0, end - start);
-memset(end, -1, s - e);
+memset((void *)s, 0, e - s);
+memset((void *)e, -1, cur - e);
+}
+
+#ifdef CONFIG_NVDIMM_PMEM
+static void __init init_frametable_pmem_chunk(unsigned long s, unsigned long e)
+{
+static unsigned long pmem_init_frametable_mfn;
+
+ASSERT(!((s | e) & (PAGE_SIZE - 1)));
+
+if ( !pmem_init_frametable_mfn )
+{
+pmem_init_frametable_mfn = alloc_boot_pages(1, 1);
+if ( !pmem_init_frametable_mfn )
+panic("Not enough memory for pmem initial frame table page");
+memset(mfn_to_virt(pmem_init_frametable_mfn), -1, PAGE_SIZE);
+}
+
+while ( s < e )
+{
+/*
+ * The real frame table entries of a pmem region will be
+ * created when the pmem region is registered to hypervisor.
+ * Any write attempt to the initial entries of that pmem
+ * region implies potential hypervisor bugs. In order to make
+ * those bugs explicit, map those initial entries as read-only.
+ */
+map_pages_to_xen(s, pmem_init_frametable_mfn, 1, PAGE_HYPERVISOR_RO);
+s += PAGE_SIZE;
+}
+}
+#endif /* CONFIG_NVDIMM_PMEM */
+
+static void __init init_frametable_chunk(void *start, void *end)
+{
+unsigned long s = (unsigned long)start;
+unsigned long e = (unsigned long)end;
+#ifdef CONFIG_NVDIMM_PMEM
+unsigned long pmem_smfn, pmem_emfn;
+unsigned long pmem_spage = s, pmem_epage = s;
+unsigned long pmem_page_aligned;
+bool found = false;
+#endif /* CONFIG_NVDIMM_PMEM */
+
+ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1)));
+
+#ifndef CONFIG_NVDIMM_PMEM
+init_frametable_ram_chunk(s, e);
+#else
+while ( s < e )
+{
+/* No p

[Xen-devel] [RFC XEN PATCH v3 28/39] xen: add hypercall XENMEM_populate_pmem_map

2017-09-10 Thread Haozhong Zhang
This hypercall will be used by device models to map host PMEM pages to
guest.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Daniel De Graaf <dgde...@tycho.nsa.gov>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
CC: Jan Beulich <jbeul...@suse.com>
---
 tools/flask/policy/modules/xen.if   |  2 +-
 tools/libxc/include/xenctrl.h   | 17 ++
 tools/libxc/xc_domain.c | 15 +
 xen/common/compat/memory.c  |  1 +
 xen/common/memory.c | 44 +
 xen/include/public/memory.h | 14 +++-
 xen/include/xsm/dummy.h | 11 ++
 xen/include/xsm/xsm.h   | 12 ++
 xen/xsm/dummy.c |  4 
 xen/xsm/flask/hooks.c   | 13 +++
 xen/xsm/flask/policy/access_vectors |  2 ++
 11 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/modules/xen.if 
b/tools/flask/policy/modules/xen.if
index 912640002e..9634dee25f 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -55,7 +55,7 @@ define(`create_domain_common', `
psr_cmt_op psr_cat_op soft_reset };
allow $1 $2:security check_context;
allow $1 $2:shadow enable;
-   allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
mmuext_op updatemp };
+   allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
mmuext_op updatemp populate_pmem_map };
allow $1 $2:grant setup;
allow $1 $2:hvm { cacheattr getparam hvmctl sethvmc
setparam nested altp2mhvm altp2mhvm_op dm };
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 41e5e3408c..a81dcdbe58 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2643,6 +2643,23 @@ int xc_nvdimm_pmem_setup_data(xc_interface *xch,
   unsigned long smfn, unsigned long emfn,
   unsigned long mgmt_smfn, unsigned long 
mgmt_emfn);
 
+/*
+ * Map specified host PMEM pages to the specified guest address.
+ *
+ * Parameters:
+ *  xch: xc interface handle
+ *  domid:   the target domain id
+ *  mfn: the start MFN of the PMEM pages
+ *  gfn: the start GFN of the target guest physical pages
+ *  nr_mfns: the number of PMEM pages to be mapped
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_domain_populate_pmem_map(xc_interface *xch, uint32_t domid,
+unsigned long mfn, unsigned long gfn,
+unsigned long nr_mfns);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 3bab4e8bab..b548da750a 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2397,6 +2397,21 @@ int xc_domain_soft_reset(xc_interface *xch,
 domctl.domain = (domid_t)domid;
 return do_domctl(xch, );
 }
+
+int xc_domain_populate_pmem_map(xc_interface *xch, uint32_t domid,
+unsigned long mfn, unsigned long gfn,
+unsigned long nr_mfns)
+{
+struct xen_pmem_map args = {
+.domid   = domid,
+.mfn = mfn,
+.gfn = gfn,
+.nr_mfns = nr_mfns,
+};
+
+return do_memory_op(xch, XENMEM_populate_pmem_map, , sizeof(args));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index 35bb259808..51bec835b9 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -525,6 +525,7 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 case XENMEM_add_to_physmap:
 case XENMEM_remove_from_physmap:
 case XENMEM_access_op:
+case XENMEM_populate_pmem_map:
 break;
 
 case XENMEM_get_vnumainfo:
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 26da6050f6..31ef480562 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1379,6 +1380,49 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 }
 #endif
 
+#ifdef CONFIG_NVDIMM_PMEM
+case XENMEM_populate_pmem_map:
+{
+struct xen_pmem_map map;
+struct xen_pmem_map_args args;
+
+if ( copy_from_guest(, arg, 1) )
+return -EFAULT;
+
+if ( map.domid == DOMID_SELF )
+return -EINVAL;
+
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( !d )
+return -EINVAL;
+
+rc = xsm_populate_pmem_map(XSM_TARGET, curr_d, d);
+if ( rc )
+{
+rcu_unlock_domain(d);
+re

[Xen-devel] [RFC XEN PATCH v3 16/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_setup to setup management PMEM region

2017-09-10 Thread Haozhong Zhang
Add a command XEN_SYSCTL_nvdimm_pmem_setup to hypercall
XEN_SYSCTL_nvdimm_op to setup the frame table and M2P table of a PMEM
region. This command is currently used to setup the management PMEM
region which is used to store the frame table and M2P table of other
PMEM regions and itself. The management PMEM region should not be
mapped to guest.

PMEM pages are not added in any Xen or domain heaps. A new flag
PGC_pmem_page is used to indicate whether a page is from PMEM and
avoid returning PMEM pages to heaps.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: George Dunlap <george.dun...@eu.citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 tools/libxc/include/xenctrl.h |  16 +
 tools/libxc/xc_misc.c |  34 ++
 xen/arch/x86/mm.c |   3 +-
 xen/arch/x86/x86_64/mm.c  |  72 +
 xen/common/pmem.c | 142 ++
 xen/include/asm-x86/mm.h  |  10 ++-
 xen/include/public/sysctl.h   |  18 ++
 xen/include/xen/pmem.h|   8 +++
 8 files changed, 301 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d750e67460..7c5707fe11 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2605,6 +2605,22 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch,
 int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
void *buffer, uint32_t *nr);
 
+/*
+ * Setup the specified PMEM pages for management usage. If success,
+ * these PMEM pages can be used to store the frametable and M2P table
+ * of itself and other PMEM pages. These management PMEM pages will
+ * never be mapped to guest.
+ *
+ * Parameters:
+ *  xch:xc interface handle
+ *  smfn, emfn: the start and end MFN of the PMEM region
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
+  unsigned long smfn, unsigned long emfn);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index f9ce802eda..bebe6d04c8 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -975,6 +975,40 @@ out:
 return rc;
 }
 
+static void xc_nvdimm_pmem_setup_common(struct xen_sysctl *sysctl,
+unsigned long smfn, unsigned long emfn,
+unsigned long mgmt_smfn,
+unsigned long mgmt_emfn)
+{
+xen_sysctl_nvdimm_op_t *nvdimm = >u.nvdimm;
+xen_sysctl_nvdimm_pmem_setup_t *setup = >u.pmem_setup;
+
+sysctl->cmd = XEN_SYSCTL_nvdimm_op;
+nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_setup;
+nvdimm->pad = 0;
+nvdimm->err = 0;
+setup->smfn = smfn;
+setup->emfn = emfn;
+setup->mgmt_smfn = mgmt_smfn;
+setup->mgmt_emfn = mgmt_emfn;
+}
+
+int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
+  unsigned long smfn, unsigned long emfn)
+{
+DECLARE_SYSCTL;
+int rc;
+
+xc_nvdimm_pmem_setup_common(, smfn, emfn, smfn, emfn);
+sysctl.u.nvdimm.u.pmem_setup.type = PMEM_REGION_TYPE_MGMT;
+
+rc = do_sysctl(xch, );
+if ( rc && sysctl.u.nvdimm.err )
+rc = -sysctl.u.nvdimm.err;
+
+return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2fdf609805..93ccf198c9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2341,7 +2341,8 @@ void put_page(struct page_info *page)
 
 if ( unlikely((nx & PGC_count_mask) == 0) )
 {
-if ( cleanup_page_cacheattr(page) == 0 )
+if ( !is_pmem_page(page) /* PMEM page is not allocated from Xen heap. 
*/
+ && cleanup_page_cacheattr(page) == 0 )
 free_domheap_page(page);
 else
 gdprintk(XENLOG_WARNING,
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index d92307ca0b..7dbc5e966c 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1535,6 +1535,78 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
unsigned int pxm)
 return ret;
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+
+static void pmem_init_frame_table(unsigned long smfn, unsigned long emfn)
+{
+struct page_info *page = mfn_to_page(smfn), *epage = mfn_to_page(emfn);
+
+while ( page < epage )
+{
+page->count_info = PGC_state_free | PGC_pmem_page;
+page++;
+}
+}
+
+/**
+ * Initialize frametable and M2P for the specified PMEM region.
+ *
+ * Parameters:
+ *  smfn, emfn: the start and end MFN of the PMEM region
+ *  mgmt_smfn,
+ *  mgmt_emfn:  the start and end MFN of the PMEM region used t

[Xen-devel] [RFC XEN PATCH v3 04/39] xen/common: add Kconfig item for pmem support

2017-09-10 Thread Haozhong Zhang
Add CONFIG_PMEM to enable NVDIMM persistent memory support. By
default, it's N.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: George Dunlap <george.dun...@eu.citrix.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Stefano Stabellini <sstabell...@kernel.org>
Cc: Tim Deegan <t...@xen.org>
Cc: Wei Liu <wei.l...@citrix.com>
---
 xen/common/Kconfig | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index dc8e876439..d4565b1c7b 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -279,4 +279,12 @@ config CMDLINE_OVERRIDE
 
  This is used to work around broken bootloaders. This should
  be set to 'N' under normal conditions.
+
+config NVDIMM_PMEM
+   bool "Persistent memory support"
+   default n
+   ---help---
+ Enable support for NVDIMM in the persistent memory mode.
+
+ If unsure, say N.
 endmenu
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 11/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_regions

2017-09-10 Thread Haozhong Zhang
XEN_SYSCTL_nvdimm_pmem_get_regions, which is a command of hypercall
XEN_SYSCTL_nvdimm_op, is to get a list of PMEM regions of specified
type (see PMEM_REGION_TYPE_*).

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 tools/libxc/include/xenctrl.h | 18 
 tools/libxc/xc_misc.c | 63 
 xen/common/pmem.c | 67 +++
 xen/include/public/sysctl.h   | 27 +
 4 files changed, 175 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index e4d26967ba..d750e67460 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2587,6 +2587,24 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t 
domid,
 int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch,
   uint8_t type, uint32_t *nr);
 
+/*
+ * Get an array of information of PMEM regions of the specified type.
+ *
+ * Parameters:
+ *  xch:xc interface handle
+ *  type:   the type of PMEM regions, must be one of PMEM_REGION_TYPE_*
+ *  buffer: the buffer where the information of PMEM regions is returned,
+ *  the caller should allocate enough memory for it.
+ *  nr :IN: the maximum number of PMEM regions that can be returned
+ *  in @buffer
+ *  OUT: the actual number of returned PMEM regions in @buffer
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
+   void *buffer, uint32_t *nr);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index fa66410869..f9ce802eda 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -912,6 +912,69 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, 
uint8_t type, uint32_t *nr)
 return rc;
 }
 
+int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t type,
+   void *buffer, uint32_t *nr)
+{
+DECLARE_SYSCTL;
+DECLARE_HYPERCALL_BOUNCE(buffer, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+xen_sysctl_nvdimm_op_t *nvdimm = 
+xen_sysctl_nvdimm_pmem_regions_t *regions = >u.pmem_regions;
+unsigned int max;
+unsigned long size;
+int rc;
+
+if ( !buffer || !nr )
+return -EINVAL;
+
+max = *nr;
+if ( !max )
+return 0;
+
+switch ( type )
+{
+case PMEM_REGION_TYPE_RAW:
+size = sizeof(xen_sysctl_nvdimm_pmem_raw_region_t) * max;
+break;
+
+default:
+return -EINVAL;
+}
+
+HYPERCALL_BOUNCE_SET_SIZE(buffer, size);
+if ( xc_hypercall_bounce_pre(xch, buffer) )
+return -EFAULT;
+
+sysctl.cmd = XEN_SYSCTL_nvdimm_op;
+nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_get_regions;
+nvdimm->pad = 0;
+nvdimm->err = 0;
+regions->type = type;
+regions->num_regions = max;
+
+switch ( type )
+{
+case PMEM_REGION_TYPE_RAW:
+set_xen_guest_handle(regions->u_buffer.raw_regions, buffer);
+break;
+
+default:
+rc = -EINVAL;
+goto out;
+}
+
+rc = do_sysctl(xch, );
+if ( !rc )
+*nr = regions->num_regions;
+else if ( nvdimm->err )
+rc = -nvdimm->err;
+
+out:
+xc_hypercall_bounce_post(xch, buffer);
+
+return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 995dfcb867..a737e7dc71 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 
+
 /*
  * All PMEM regions presenting in NFIT SPA range structures are linked
  * in this list.
@@ -122,6 +124,67 @@ static int 
pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
 return rc;
 }
 
+static int pmem_get_raw_regions(
+XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) regions,
+unsigned int *num_regions)
+{
+struct list_head *cur;
+unsigned int nr = 0, max = *num_regions;
+xen_sysctl_nvdimm_pmem_raw_region_t region;
+int rc = 0;
+
+if ( !guest_handle_okay(regions, max * sizeof(region)) )
+return -EINVAL;
+
+list_for_each(cur, _raw_regions)
+{
+struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+if ( nr >= max )
+break;
+
+region.smfn = pmem->smfn;
+region.emfn = pmem->emfn;
+region.pxm = pmem->u.raw.pxm;
+
+if ( copy_to_guest_offset(regions, nr, , 1) )
+{
+rc = -EFAULT;
+break;
+}
+
+nr++;
+}
+
+*num_regions = nr;
+
+return rc;
+}
+
+static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions

[Xen-devel] [RFC XEN PATCH v3 25/39] tools/xen-ndctl: add option '--data' to command 'list'

2017-09-10 Thread Haozhong Zhang
If the option '--data' is present, the command 'list' will list all
PMEM regions for guest data usage.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/misc/xen-ndctl.c | 40 ++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index 320633ae05..33817863ca 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -58,10 +58,11 @@ static const struct xen_ndctl_cmd
 
 {
 .name= "list",
-.syntax  = "[--all | --raw | --mgmt]",
+.syntax  = "[--all | --raw | --mgmt | --data]",
 .help= "--all: the default option, list all PMEM regions of 
following types.\n"
"--raw: list all PMEM regions detected by Xen hypervisor.\n"
-   "--mgmt: list all PMEM regions for management usage.\n",
+   "--mgmt: list all PMEM regions for management usage.\n"
+   "--data: list all PMEM regions that can be mapped to 
guest.\n",
 .handler = handle_list,
 .need_xc = true,
 },
@@ -209,6 +210,40 @@ static int handle_list_mgmt(void)
 return rc;
 }
 
+static int handle_list_data(void)
+{
+int rc;
+unsigned int nr = 0, i;
+xen_sysctl_nvdimm_pmem_data_region_t *data_list;
+
+rc = xc_nvdimm_pmem_get_regions_nr(xch, PMEM_REGION_TYPE_DATA, );
+if ( rc )
+{
+fprintf(stderr, "Cannot get the number of PMEM regions: %s.\n",
+strerror(-rc));
+return rc;
+}
+
+data_list = malloc(nr * sizeof(*data_list));
+if ( !data_list )
+return -ENOMEM;
+
+rc = xc_nvdimm_pmem_get_regions(xch, PMEM_REGION_TYPE_DATA, data_list, 
);
+if ( rc )
+goto out;
+
+printf("Data PMEM regions:\n");
+for ( i = 0; i < nr; i++ )
+printf(" %u: MFN 0x%lx - 0x%lx, MGMT MFN 0x%lx - 0x%lx\n",
+   i, data_list[i].smfn, data_list[i].emfn,
+   data_list[i].mgmt_smfn, data_list[i].mgmt_emfn);
+
+ out:
+free(data_list);
+
+return rc;
+}
+
 static const struct list_handlers {
 const char *option;
 int (*handler)(void);
@@ -216,6 +251,7 @@ static const struct list_handlers {
 {
 { "--raw", handle_list_raw },
 { "--mgmt", handle_list_mgmt },
+{ "--data", handle_list_data },
 };
 
 static const unsigned int nr_list_hndrs =
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 22/39] tools/xen-ndctl: add command 'setup-data'

2017-09-10 Thread Haozhong Zhang
This command is to query Xen hypervisor to setup the specified PMEM
range for guest data usage.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/misc/xen-ndctl.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index 058f8ccaf5..320633ae05 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -37,6 +37,7 @@ static int handle_help(int argc, char *argv[]);
 static int handle_list(int argc, char *argv[]);
 static int handle_list_cmds(int argc, char *argv[]);
 static int handle_setup_mgmt(int argc, char *argv[]);
+static int handle_setup_data(int argc, char *argv[]);
 
 static const struct xen_ndctl_cmd
 {
@@ -72,6 +73,18 @@ static const struct xen_ndctl_cmd
 .handler = handle_list_cmds,
 },
 
+{
+.name= "setup-data",
+.syntax  = "   ",
+.help= "Setup a PMEM region from MFN 'smfn' to 'emfn' for guest 
data usage,\n"
+   "which can be used as the backend of the virtual NVDIMM 
devices.\n\n"
+   "PMEM pages from MFN 'mgmt_smfn' to 'mgmt_emfn' is used to 
manage\n"
+   "the above PMEM region, and should not overlap with MFN 
from 'smfn'\n"
+   "to 'emfn'.\n",
+.handler = handle_setup_data,
+.need_xc = true,
+},
+
 {
 .name= "setup-mgmt",
 .syntax  = " ",
@@ -277,6 +290,29 @@ static int handle_setup_mgmt(int argc, char **argv)
 return xc_nvdimm_pmem_setup_mgmt(xch, smfn, emfn);
 }
 
+static int handle_setup_data(int argc, char **argv)
+{
+unsigned long smfn, emfn, mgmt_smfn, mgmt_emfn;
+
+if ( argc < 5 )
+{
+fprintf(stderr, "Too few arguments.\n\n");
+show_help(argv[0]);
+return -EINVAL;
+}
+
+if ( !string_to_mfn(argv[1], ) ||
+ !string_to_mfn(argv[2], ) ||
+ !string_to_mfn(argv[3], _smfn) ||
+ !string_to_mfn(argv[4], _emfn) )
+return -EINVAL;
+
+if ( argc > 5 )
+return handle_unrecognized_argument(argv[0], argv[5]);
+
+return xc_nvdimm_pmem_setup_data(xch, smfn, emfn, mgmt_smfn, mgmt_emfn);
+}
+
 int main(int argc, char *argv[])
 {
 unsigned int i;
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 23/39] xen/pmem: support PMEM_REGION_TYPE_DATA for XEN_SYSCTL_nvdimm_pmem_get_regions_nr

2017-09-10 Thread Haozhong Zhang
Allow XEN_SYSCTL_nvdimm_pmem_get_regions_nr to return the number of
data PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 tools/libxc/xc_misc.c | 3 ++-
 xen/common/pmem.c | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index ef2e9e0656..db74df853a 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -896,7 +896,8 @@ int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, 
uint8_t type, uint32_t *nr)
 
 if ( !nr ||
  (type != PMEM_REGION_TYPE_RAW &&
-  type != PMEM_REGION_TYPE_MGMT) )
+  type != PMEM_REGION_TYPE_MGMT &&
+  type != PMEM_REGION_TYPE_DATA) )
 return -EINVAL;
 
 sysctl.cmd = XEN_SYSCTL_nvdimm_op;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 6891ed7a47..cbe557c220 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -162,6 +162,10 @@ static int 
pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
 regions_nr->num_regions = nr_mgmt_regions;
 break;
 
+case PMEM_REGION_TYPE_DATA:
+regions_nr->num_regions = nr_data_regions;
+break;
+
 default:
 rc = -EINVAL;
 }
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 12/39] tools/xen-ndctl: add NVDIMM management util 'xen-ndctl'

2017-09-10 Thread Haozhong Zhang
The kernel NVDIMM driver and the traditional NVDIMM management
utilities in Dom0 does not work now. 'xen-ndctl' is added as an
alternatively, which manages NVDIMM via Xen hypercalls.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 .gitignore |   1 +
 tools/misc/Makefile|   4 ++
 tools/misc/xen-ndctl.c | 172 +
 3 files changed, 177 insertions(+)
 create mode 100644 tools/misc/xen-ndctl.c

diff --git a/.gitignore b/.gitignore
index ecb198f914..30655673f7 100644
--- a/.gitignore
+++ b/.gitignore
@@ -216,6 +216,7 @@ tools/misc/xen-hvmctx
 tools/misc/xenlockprof
 tools/misc/lowmemd
 tools/misc/xencov
+tools/misc/xen-ndctl
 tools/pkg-config/*
 tools/qemu-xen-build
 tools/xentrace/xenalyze
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index eaa28793ef..124775b7f4 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -32,6 +32,7 @@ INSTALL_SBIN   += xenpm
 INSTALL_SBIN   += xenwatchdogd
 INSTALL_SBIN   += xen-livepatch
 INSTALL_SBIN   += xen-diag
+INSTALL_SBIN   += xen-ndctl
 INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
@@ -118,4 +119,7 @@ xen-lowmemd: xen-lowmemd.o
 xencov: xencov.o
$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-ndctl: xen-ndctl.o
+   $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 -include $(DEPS_INCLUDE)
diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
new file mode 100644
index 00..de40e29ff6
--- /dev/null
+++ b/tools/misc/xen-ndctl.c
@@ -0,0 +1,172 @@
+/*
+ * xen-ndctl.c
+ *
+ * Xen NVDIMM management tool
+ *
+ * Copyright (C) 2017,  Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without restriction,
+ * including without limitation the rights to use, copy, modify, merge,
+ * publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so,
+ * subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static xc_interface *xch;
+
+static int handle_help(int argc, char *argv[]);
+static int handle_list_cmds(int argc, char *argv[]);
+
+static const struct xen_ndctl_cmd
+{
+const char *name;
+const char *syntax;
+const char *help;
+int (*handler)(int argc, char **argv);
+bool need_xc;
+} cmds[] =
+{
+{
+.name= "help",
+.syntax  = "[command]",
+.help= "Show this message or the help message of 'command'.\n"
+   "Use command 'list-cmds' to list all supported commands.\n",
+.handler = handle_help,
+},
+
+{
+.name= "list-cmds",
+.syntax  = "",
+.help= "List all supported commands.\n",
+.handler = handle_list_cmds,
+},
+};
+
+static const unsigned int nr_cmds = sizeof(cmds) / sizeof(cmds[0]);
+
+static void show_help(const char *cmd)
+{
+unsigned int i;
+
+if ( !cmd )
+{
+fprintf(stderr,
+"Usage: xen-ndctl  [args]\n\n"
+"List all supported commands by 'xen-ndctl list-cmds'.\n"
+"Get help of a command by 'xen-ndctl help '.\n");
+return;
+}
+
+for ( i = 0; i < nr_cmds; i++ )
+if ( !strcmp(cmd, cmds[i].name) )
+{
+fprintf(stderr, "Usage: xen-ndctl %s %s\n\n%s",
+cmds[i].name, cmds[i].syntax, cmds[i].help);
+break;
+}
+
+if ( i == nr_cmds )
+fprintf(stderr, "Unsupported command '%s'.\n"
+"List all supported commands by 'xen-ndctl list-cmds'.\n",
+cmd);
+}
+
+static int handle_unrecognized_argument(const char *cmd, const char *argv)
+{
+fprintf(stderr, "Unrecognized argument: %s.\n\n", argv);
+show_help(cmd);
+
+return -EINVAL

[Xen-devel] [RFC XEN PATCH v3 26/39] xen/pmem: add function to map PMEM pages to HVM domain

2017-09-10 Thread Haozhong Zhang
pmem_populate() is added to map the specifed data PMEM pages to a HVM
domain. No called is added in this commit.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 xen/common/domain.c |   3 ++
 xen/common/pmem.c   | 141 
 xen/include/xen/pmem.h  |  19 +++
 xen/include/xen/sched.h |   3 ++
 4 files changed, 166 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 5aebcf265f..4354342b02 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -290,6 +290,9 @@ struct domain *domain_create(domid_t domid, unsigned int 
domcr_flags,
 INIT_PAGE_LIST_HEAD(>page_list);
 INIT_PAGE_LIST_HEAD(>xenpage_list);
 
+spin_lock_init(>pmem_lock);
+INIT_PAGE_LIST_HEAD(>pmem_page_list);
+
 spin_lock_init(>node_affinity_lock);
 d->node_affinity = NODE_MASK_ALL;
 d->auto_node_affinity = 1;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index ed4a014c30..2f9ad64a26 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -17,10 +17,12 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -78,6 +80,31 @@ static bool check_overlap(unsigned long smfn1, unsigned long 
emfn1,
(emfn1 > smfn2 && emfn1 <= emfn2);
 }
 
+static bool check_cover(struct list_head *list,
+unsigned long smfn, unsigned long emfn)
+{
+struct list_head *cur;
+struct pmem *pmem;
+unsigned long pmem_smfn, pmem_emfn;
+
+list_for_each(cur, list)
+{
+pmem = list_entry(cur, struct pmem, link);
+pmem_smfn = pmem->smfn;
+pmem_emfn = pmem->emfn;
+
+if ( smfn < pmem_smfn )
+return false;
+
+if ( emfn <= pmem_emfn )
+return true;
+
+smfn = max(smfn, pmem_emfn);
+}
+
+return false;
+}
+
 /**
  * Add a PMEM region to a list. All PMEM regions in the list are
  * sorted in the ascending order of the start address. A PMEM region,
@@ -600,6 +627,120 @@ int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
 
 #ifdef CONFIG_X86
 
+static int pmem_assign_page(struct domain *d, struct page_info *pg,
+unsigned long gfn)
+{
+int rc;
+
+if ( pg->count_info != (PGC_state_free | PGC_pmem_page) )
+return -EBUSY;
+
+pg->count_info = PGC_allocated | PGC_state_inuse | PGC_pmem_page | 1;
+pg->u.inuse.type_info = 0;
+page_set_owner(pg, d);
+
+rc = guest_physmap_add_page(d, _gfn(gfn), _mfn(page_to_mfn(pg)), 0);
+if ( rc )
+{
+page_set_owner(pg, NULL);
+pg->count_info = PGC_state_free | PGC_pmem_page;
+
+return rc;
+}
+
+spin_lock(>pmem_lock);
+page_list_add_tail(pg, >pmem_page_list);
+spin_unlock(>pmem_lock);
+
+return 0;
+}
+
+static int pmem_unassign_page(struct domain *d, struct page_info *pg,
+  unsigned long gfn)
+{
+int rc;
+
+spin_lock(>pmem_lock);
+page_list_del(pg, >pmem_page_list);
+spin_unlock(>pmem_lock);
+
+rc = guest_physmap_remove_page(d, _gfn(gfn), _mfn(page_to_mfn(pg)), 0);
+
+page_set_owner(pg, NULL);
+pg->count_info = PGC_state_free | PGC_pmem_page;
+
+return 0;
+}
+
+int pmem_populate(struct xen_pmem_map_args *args)
+{
+struct domain *d = args->domain;
+unsigned long i = args->nr_done;
+unsigned long mfn = args->mfn + i;
+unsigned long emfn = args->mfn + args->nr_mfns;
+unsigned long gfn = args->gfn + i;
+struct page_info *page;
+int rc = 0, err = 0;
+
+if ( unlikely(d->is_dying) )
+return -EINVAL;
+
+if ( !is_hvm_domain(d) )
+return -EINVAL;
+
+spin_lock(_data_lock);
+
+if ( !check_cover(_data_regions, mfn, emfn) )
+{
+rc = -ENXIO;
+goto out;
+}
+
+for ( ; mfn < emfn; i++, mfn++, gfn++ )
+{
+if ( i != args->nr_done && hypercall_preempt_check() )
+{
+args->preempted = 1;
+rc = -ERESTART;
+break;
+}
+
+page = mfn_to_page(mfn);
+if ( !page_state_is(page, free) )
+{
+rc = -EBUSY;
+break;
+}
+
+rc = pmem_assign_page(d, page, gfn);
+if ( rc )
+break;
+}
+
+ out:
+if ( rc && rc != -ERESTART )
+while ( i-- && !err )
+err = pmem_unassign_page(d, mfn_to_page(--mfn), --gfn);
+
+spin_unlock(_data_lock);
+
+if ( unlikely(err) )
+{
+/*
+ * If we unfortunately fails to recover from the previous
+ * failure, some PMEM pages may still be mapped to the
+ * domain. As pmem_populate() is now called only during domain
+ * creation, let's crash the domain.
+  

[Xen-devel] [RFC XEN PATCH v3 03/39] x86_64/mm: avoid cleaning the unmapped frame table

2017-09-10 Thread Haozhong Zhang
cleanup_frame_table() initializes the entire newly added frame table
to all -1's. If it's called after extend_frame_table() failed to map
the entire frame table, the initialization will hit a page fault.

Move the cleanup of partially mapped frametable to extend_frame_table(),
which has enough knowledge of the mapping status.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 51 ++--
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index c93383d7d9..f635e4bf70 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -710,15 +710,12 @@ void free_compat_arg_xlat(struct vcpu *v)
   PFN_UP(COMPAT_ARG_XLAT_SIZE));
 }
 
-static void cleanup_frame_table(struct mem_hotadd_info *info)
+static void cleanup_frame_table(unsigned long spfn, unsigned long epfn)
 {
+struct mem_hotadd_info info = { .spfn = spfn, .epfn = epfn, .cur = spfn };
 unsigned long sva, eva;
 l3_pgentry_t l3e;
 l2_pgentry_t l2e;
-unsigned long spfn, epfn;
-
-spfn = info->spfn;
-epfn = info->epfn;
 
 sva = (unsigned long)mfn_to_page(spfn);
 eva = (unsigned long)mfn_to_page(epfn);
@@ -744,7 +741,7 @@ static void cleanup_frame_table(struct mem_hotadd_info 
*info)
 if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) ==
   (_PAGE_PSE | _PAGE_PRESENT) )
 {
-if (hotadd_mem_valid(l2e_get_pfn(l2e), info))
+if ( hotadd_mem_valid(l2e_get_pfn(l2e), ) )
 destroy_xen_mappings(sva & ~((1UL << L2_PAGETABLE_SHIFT) - 1),
  ((sva & ~((1UL << L2_PAGETABLE_SHIFT) -1 )) +
 (1UL << L2_PAGETABLE_SHIFT) - 1));
@@ -769,28 +766,33 @@ static int setup_frametable_chunk(void *start, void *end,
 {
 unsigned long s = (unsigned long)start;
 unsigned long e = (unsigned long)end;
-unsigned long mfn;
-int err;
+unsigned long cur, mfn;
+int err = 0;
 
 ASSERT(!(s & ((1 << L2_PAGETABLE_SHIFT) - 1)));
 ASSERT(!(e & ((1 << L2_PAGETABLE_SHIFT) - 1)));
 
-for ( ; s < e; s += (1UL << L2_PAGETABLE_SHIFT))
+for ( cur = s; cur < e; cur += (1UL << L2_PAGETABLE_SHIFT) )
 {
 mfn = alloc_hotadd_mfn(info);
-err = map_pages_to_xen(s, mfn, 1UL << PAGETABLE_ORDER,
+err = map_pages_to_xen(cur, mfn, 1UL << PAGETABLE_ORDER,
PAGE_HYPERVISOR);
 if ( err )
-return err;
+break;
 }
-memset(start, -1, s - (unsigned long)start);
 
-return 0;
+if ( !err )
+memset(start, -1, cur - s);
+else
+destroy_xen_mappings(s, cur);
+
+return err;
 }
 
 static int extend_frame_table(struct mem_hotadd_info *info)
 {
 unsigned long cidx, nidx, eidx, spfn, epfn;
+int err = 0;
 
 spfn = info->spfn;
 epfn = info->epfn;
@@ -809,8 +811,6 @@ static int extend_frame_table(struct mem_hotadd_info *info)
 
 while ( cidx < eidx )
 {
-int err;
-
 nidx = find_next_bit(pdx_group_valid, eidx, cidx);
 if ( nidx >= eidx )
 nidx = eidx;
@@ -818,14 +818,19 @@ static int extend_frame_table(struct mem_hotadd_info 
*info)
  pdx_to_page(nidx * PDX_GROUP_COUNT),
  info);
 if ( err )
-return err;
+break;
 
 cidx = find_next_zero_bit(pdx_group_valid, eidx, nidx);
 }
 
-memset(mfn_to_page(spfn), 0,
-   (unsigned long)mfn_to_page(epfn) - (unsigned 
long)mfn_to_page(spfn));
-return 0;
+if ( !err )
+memset(mfn_to_page(spfn), 0,
+   (unsigned long)mfn_to_page(epfn) -
+   (unsigned long)mfn_to_page(spfn));
+else
+cleanup_frame_table(spfn, pdx_to_pfn(cidx * PDX_GROUP_COUNT));
+
+return err;
 }
 
 void __init subarch_init_memory(void)
@@ -1404,8 +1409,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
unsigned int pxm)
 info.cur = spfn;
 
 ret = extend_frame_table();
-if (ret)
-goto destroy_frametable;
+if ( ret )
+goto restore_node_status;
 
 /* Set max_page as setup_m2p_table will use it*/
 if (max_page < epfn)
@@ -1448,8 +1453,8 @@ destroy_m2p:
 max_page = old_max;
 total_pages = old_total;
 max_pdx = pfn_to_pdx(max_page - 1) + 1;
-destroy_frametable:
-cleanup_frame_table();
+cleanup_frame_table(spfn, epfn);
+restore_node_status:
 if ( !orig_online )
 node_set_offline(node);
 NODE_DATA(node)->node_start_pfn = old_node_start;
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 07/39] xen/pmem: register valid PMEM regions to Xen hypervisor

2017-09-10 Thread Haozhong Zhang
Register valid PMEM regions probed via NFIT to Xen hypervisor. No
frametable and M2P table are created for those PMEM regions at this
stage.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 xen/common/Makefile |   1 +
 xen/common/pmem.c   | 130 
 xen/drivers/acpi/nfit.c |  12 -
 xen/include/xen/pmem.h  |  28 +++
 4 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/pmem.c
 create mode 100644 xen/include/xen/pmem.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 39e2614546..46f9d1f57f 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -29,6 +29,7 @@ obj-y += notifier.o
 obj-y += page_alloc.o
 obj-$(CONFIG_HAS_PDX) += pdx.o
 obj-$(CONFIG_PERF_COUNTERS) += perfc.o
+obj-${CONFIG_NVDIMM_PMEM} += pmem.o
 obj-y += preempt.o
 obj-y += random.o
 obj-y += rangeset.o
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
new file mode 100644
index 00..49648222a6
--- /dev/null
+++ b/xen/common/pmem.c
@@ -0,0 +1,130 @@
+/*
+ * xen/common/pmem.c
+ *
+ * Copyright (C) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ * All PMEM regions presenting in NFIT SPA range structures are linked
+ * in this list.
+ */
+static LIST_HEAD(pmem_raw_regions);
+static unsigned int nr_raw_regions;
+
+struct pmem {
+struct list_head link; /* link to one of PMEM region list */
+unsigned long smfn;/* start MFN of the PMEM region */
+unsigned long emfn;/* end MFN of the PMEM region */
+
+union {
+struct {
+unsigned int pxm; /* proximity domain of the PMEM region */
+} raw;
+} u;
+};
+
+static bool check_overlap(unsigned long smfn1, unsigned long emfn1,
+  unsigned long smfn2, unsigned long emfn2)
+{
+return (smfn1 >= smfn2 && smfn1 < emfn2) ||
+   (emfn1 > smfn2 && emfn1 <= emfn2);
+}
+
+/**
+ * Add a PMEM region to a list. All PMEM regions in the list are
+ * sorted in the ascending order of the start address. A PMEM region,
+ * whose range is overlapped with anyone in the list, cannot be added
+ * to the list.
+ *
+ * Parameters:
+ *  list:   the list to which a new PMEM region will be added
+ *  smfn, emfn: the range of the new PMEM region
+ *  entry:  return the new entry added to the list
+ *
+ * Return:
+ *  On success, return 0 and the new entry added to the list is
+ *  returned via @entry. Otherwise, return an error number and the
+ *  value of @entry is undefined.
+ */
+static int pmem_list_add(struct list_head *list,
+ unsigned long smfn, unsigned long emfn,
+ struct pmem **entry)
+{
+struct list_head *cur;
+struct pmem *new_pmem;
+int rc = 0;
+
+list_for_each_prev(cur, list)
+{
+struct pmem *cur_pmem = list_entry(cur, struct pmem, link);
+unsigned long cur_smfn = cur_pmem->smfn;
+unsigned long cur_emfn = cur_pmem->emfn;
+
+if ( check_overlap(smfn, emfn, cur_smfn, cur_emfn) )
+{
+rc = -EEXIST;
+goto out;
+}
+
+if ( cur_smfn < smfn )
+break;
+}
+
+new_pmem = xzalloc(struct pmem);
+if ( !new_pmem )
+{
+rc = -ENOMEM;
+goto out;
+}
+new_pmem->smfn = smfn;
+new_pmem->emfn = emfn;
+list_add(_pmem->link, cur);
+
+ out:
+if ( !rc && entry )
+*entry = new_pmem;
+
+return rc;
+}
+
+/**
+ * Register a pmem region to Xen.
+ *
+ * Parameters:
+ *  smfn, emfn: start and end MFNs of the pmem region
+ *  pxm:the proximity domain of the pmem region
+ *
+ * Return:
+ *  On success, return 0. Otherwise, an error number is returned.
+ */
+int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm)
+{
+int rc;
+struct pmem *pmem;
+
+if ( smfn >= emfn )
+return -EINVAL;
+
+rc = pmem_list_add(_raw_regions, smfn, emfn, );
+if ( !rc )
+pmem->u.raw.pxm = pxm;
+nr_raw_regions++;
+
+return rc;
+}
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
index b88a587b8d..68750c2edc 100644
--- a/xen/drivers/acpi/nfit.c
+++ b/xen/drivers/acpi

[Xen-devel] [RFC XEN PATCH v3 19/39] xen/pmem: support PMEM_REGION_TYPE_MGMT for XEN_SYSCTL_nvdimm_pmem_get_regions

2017-09-10 Thread Haozhong Zhang
Allow XEN_SYSCTL_nvdimm_pmem_get_regions to return a list of
management PMEM regions.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 tools/libxc/xc_misc.c   |  8 
 xen/common/pmem.c   | 45 +
 xen/include/public/sysctl.h | 11 +++
 3 files changed, 64 insertions(+)

diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 4b5558aaa5..3ad254f5ae 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -939,6 +939,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t 
type,
 size = sizeof(xen_sysctl_nvdimm_pmem_raw_region_t) * max;
 break;
 
+case PMEM_REGION_TYPE_MGMT:
+size = sizeof(xen_sysctl_nvdimm_pmem_mgmt_region_t) * max;
+break;
+
 default:
 return -EINVAL;
 }
@@ -960,6 +964,10 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, uint8_t 
type,
 set_xen_guest_handle(regions->u_buffer.raw_regions, buffer);
 break;
 
+case PMEM_REGION_TYPE_MGMT:
+set_xen_guest_handle(regions->u_buffer.mgmt_regions, buffer);
+break;
+
 default:
 rc = -EINVAL;
 goto out;
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index 54b3e7119a..dcd8160407 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -190,6 +190,47 @@ static int pmem_get_raw_regions(
 return rc;
 }
 
+static int pmem_get_mgmt_regions(
+XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) regions,
+unsigned int *num_regions)
+{
+struct list_head *cur;
+unsigned int nr = 0, max = *num_regions;
+xen_sysctl_nvdimm_pmem_mgmt_region_t region;
+int rc = 0;
+
+if ( !guest_handle_okay(regions, max * sizeof(region)) )
+return -EINVAL;
+
+spin_lock(_mgmt_lock);
+
+list_for_each(cur, _mgmt_regions)
+{
+struct pmem *pmem = list_entry(cur, struct pmem, link);
+
+if ( nr >= max )
+break;
+
+region.smfn = pmem->smfn;
+region.emfn = pmem->emfn;
+region.used_mfns = pmem->u.mgmt.used;
+
+if ( copy_to_guest_offset(regions, nr, , 1) )
+{
+rc = -EFAULT;
+break;
+}
+
+nr++;
+}
+
+spin_unlock(_mgmt_lock);
+
+*num_regions = nr;
+
+return rc;
+}
+
 static int pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 {
 unsigned int type = regions->type, max = regions->num_regions;
@@ -204,6 +245,10 @@ static int 
pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 rc = pmem_get_raw_regions(regions->u_buffer.raw_regions, );
 break;
 
+case PMEM_REGION_TYPE_MGMT:
+rc = pmem_get_mgmt_regions(regions->u_buffer.mgmt_regions, );
+break;
+
 default:
 rc = -EINVAL;
 }
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 5d208033a0..f825716446 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1131,6 +1131,15 @@ struct xen_sysctl_nvdimm_pmem_raw_region {
 typedef struct xen_sysctl_nvdimm_pmem_raw_region 
xen_sysctl_nvdimm_pmem_raw_region_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_raw_region_t);
 
+/* PMEM_REGION_TYPE_MGMT */
+struct xen_sysctl_nvdimm_pmem_mgmt_region {
+uint64_t smfn;
+uint64_t emfn;
+uint64_t used_mfns;
+};
+typedef struct xen_sysctl_nvdimm_pmem_mgmt_region 
xen_sysctl_nvdimm_pmem_mgmt_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_mgmt_region_t);
+
 /* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
 struct xen_sysctl_nvdimm_pmem_regions_nr {
 uint8_t type; /* IN: one of PMEM_REGION_TYPE_* */
@@ -1149,6 +1158,8 @@ struct xen_sysctl_nvdimm_pmem_regions {
 union {
 /* if type == PMEM_REGION_TYPE_RAW */
 XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_raw_region_t) raw_regions;
+/* if type == PMEM_REGION_TYPE_MGMT */
+XEN_GUEST_HANDLE_64(xen_sysctl_nvdimm_pmem_mgmt_region_t) mgmt_regions;
 } u_buffer;   /* IN: the guest handler where the entries of PMEM
  regions of the type @type are returned */
 };
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()

2017-09-10 Thread Haozhong Zhang
The current check refuses the hot-plugged memory that falls in one
unused PDX group, which should be allowed.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 11746730b4..6c5221f90c 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1296,12 +1296,8 @@ static int mem_hotadd_check(unsigned long spfn, unsigned 
long epfn)
 return 0;
 
 /* Make sure the new range is not present now */
-sidx = ((pfn_to_pdx(spfn) + PDX_GROUP_COUNT - 1)  & ~(PDX_GROUP_COUNT - 1))
-/ PDX_GROUP_COUNT;
+sidx = (pfn_to_pdx(spfn) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT;
 eidx = (pfn_to_pdx(epfn - 1) & ~(PDX_GROUP_COUNT - 1)) / PDX_GROUP_COUNT;
-if (sidx >= eidx)
-return 0;
-
 s = find_next_zero_bit(pdx_group_valid, eidx, sidx);
 if ( s > eidx )
 return 0;
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 10/39] xen/pmem: add XEN_SYSCTL_nvdimm_pmem_get_rgions_nr

2017-09-10 Thread Haozhong Zhang
XEN_SYSCTL_nvdimm_pmem_get_rgions_nr, which is a command of hypercall
XEN_SYSCTL_nvdimm_op, is to get the number of PMEM regions of the
specified type (see PMEM_REGION_TYPE_*).

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 tools/libxc/include/xenctrl.h | 15 +++
 tools/libxc/xc_misc.c | 24 
 xen/common/pmem.c | 29 -
 xen/include/public/sysctl.h   | 16 ++--
 4 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 43151cb415..e4d26967ba 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2572,6 +2572,21 @@ int xc_livepatch_replace(xc_interface *xch, char *name, 
uint32_t timeout);
 int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
  xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
 
+/*
+ * Get the number of PMEM regions of the specified type.
+ *
+ * Parameters:
+ *  xch:  xc interface handle
+ *  type: the type of PMEM regions, must be one of PMEM_REGION_TYPE_*
+ *  nr:   the number of PMEM regions is returned via this parameter
+ *
+ * Return:
+ *  On success, return 0 and the number of PMEM regions is returned via @nr.
+ *  Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch,
+  uint8_t type, uint32_t *nr);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 7e15e904e3..fa66410869 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -888,6 +888,30 @@ int xc_livepatch_replace(xc_interface *xch, char *name, 
uint32_t timeout)
 return _xc_livepatch_action(xch, name, LIVEPATCH_ACTION_REPLACE, timeout);
 }
 
+int xc_nvdimm_pmem_get_regions_nr(xc_interface *xch, uint8_t type, uint32_t 
*nr)
+{
+DECLARE_SYSCTL;
+xen_sysctl_nvdimm_op_t *nvdimm = 
+int rc;
+
+if ( !nr || type != PMEM_REGION_TYPE_RAW )
+return -EINVAL;
+
+sysctl.cmd = XEN_SYSCTL_nvdimm_op;
+nvdimm->cmd = XEN_SYSCTL_nvdimm_pmem_get_regions_nr;
+nvdimm->pad = 0;
+nvdimm->u.pmem_regions_nr.type = type;
+nvdimm->err = 0;
+
+rc = do_sysctl(xch, );
+if ( !rc )
+*nr = nvdimm->u.pmem_regions_nr.num_regions;
+else if ( nvdimm->err )
+rc = nvdimm->err;
+
+return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index d67f237cd5..995dfcb867 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -105,6 +105,23 @@ static int pmem_list_add(struct list_head *list,
 return rc;
 }
 
+static int pmem_get_regions_nr(xen_sysctl_nvdimm_pmem_regions_nr_t *regions_nr)
+{
+int rc = 0;
+
+switch ( regions_nr->type )
+{
+case PMEM_REGION_TYPE_RAW:
+regions_nr->num_regions = nr_raw_regions;
+break;
+
+default:
+rc = -EINVAL;
+}
+
+return rc;
+}
+
 /**
  * Register a pmem region to Xen.
  *
@@ -142,7 +159,17 @@ int pmem_register(unsigned long smfn, unsigned long emfn, 
unsigned int pxm)
  */
 int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
 {
-int rc = -ENOSYS;
+int rc;
+
+switch ( nvdimm->cmd )
+{
+case XEN_SYSCTL_nvdimm_pmem_get_regions_nr:
+rc = pmem_get_regions_nr(>u.pmem_regions_nr);
+break;
+
+default:
+rc = -ENOSYS;
+}
 
 nvdimm->err = -rc;
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index e8272ae968..cf308bbc45 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1118,11 +1118,23 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_set_parameter_t);
  * Interface for NVDIMM management.
  */
 
+/* Types of PMEM regions */
+#define PMEM_REGION_TYPE_RAW0 /* PMEM regions detected by Xen */
+
+/* XEN_SYSCTL_nvdimm_pmem_get_regions_nr */
+struct xen_sysctl_nvdimm_pmem_regions_nr {
+uint8_t type; /* IN: one of PMEM_REGION_TYPE_* */
+uint32_t num_regions; /* OUT: the number of PMEM regions of type @type */
+};
+typedef struct xen_sysctl_nvdimm_pmem_regions_nr 
xen_sysctl_nvdimm_pmem_regions_nr_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_pmem_regions_nr_t);
+
 struct xen_sysctl_nvdimm_op {
-uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */
+uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*. */
+#define XEN_SYSCTL_nvdimm_pmem_get_regions_nr 0
 uint32_t pad; /* IN: Always zero. */
 union {
-/* Parameters of XEN_SYSCTL_nvdimm_* will be added here. */
+xen_sysctl_nvdimm_pmem_regions_nr_t pmem_regions_nr;
 } u;
 uint32_t err; /* OUT: error code */
 };
-- 
2.14.1


__

[Xen-devel] [RFC XEN PATCH v3 21/39] xen/pmem: support setup PMEM region for guest data usage

2017-09-10 Thread Haozhong Zhang
Allow the command XEN_SYSCTL_nvdimm_pmem_setup of hypercall
XEN_SYSCTL_nvdimm_op to setup a PMEM region for guest data
usage. After the setup, that PMEM region will be able to be
mapped to guest address space.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 tools/libxc/include/xenctrl.h |  22 
 tools/libxc/xc_misc.c |  17 ++
 xen/common/pmem.c | 118 +-
 xen/include/public/sysctl.h   |   3 +-
 4 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 7c5707fe11..41e5e3408c 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2621,6 +2621,28 @@ int xc_nvdimm_pmem_get_regions(xc_interface *xch, 
uint8_t type,
 int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
   unsigned long smfn, unsigned long emfn);
 
+/*
+ * Setup the specified PMEM pages for guest data usage. If success,
+ * these PMEM page can be mapped to guest and be used as the backend
+ * of vNDIMM devices.
+ *
+ * Parameters:
+ *  xch:xc interface handle
+ *  smfn, emfn: the start and end of the PMEM region
+ *  mgmt_smfn,
+
+ *  mgmt_emfn:  the start and the end MFN of the PMEM region that is
+ *  used to manage this PMEM region. It must be in one of
+ *  those added by xc_nvdimm_pmem_setup_mgmt() calls, and
+ *  not overlap with @smfn - @emfn.
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int xc_nvdimm_pmem_setup_data(xc_interface *xch,
+  unsigned long smfn, unsigned long emfn,
+  unsigned long mgmt_smfn, unsigned long 
mgmt_emfn);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 3ad254f5ae..ef2e9e0656 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -1019,6 +1019,23 @@ int xc_nvdimm_pmem_setup_mgmt(xc_interface *xch,
 return rc;
 }
 
+int xc_nvdimm_pmem_setup_data(xc_interface *xch,
+  unsigned long smfn, unsigned long emfn,
+  unsigned long mgmt_smfn, unsigned long mgmt_emfn)
+{
+DECLARE_SYSCTL;
+int rc;
+
+xc_nvdimm_pmem_setup_common(, smfn, emfn, mgmt_smfn, mgmt_emfn);
+sysctl.u.nvdimm.u.pmem_setup.type = PMEM_REGION_TYPE_DATA;
+
+rc = do_sysctl(xch, );
+if ( rc && sysctl.u.nvdimm.err )
+rc = -sysctl.u.nvdimm.err;
+
+return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index dcd8160407..6891ed7a47 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -34,16 +34,26 @@ static unsigned int nr_raw_regions;
 /*
  * All PMEM regions reserved for management purpose are linked to this
  * list. All of them must be covered by one or multiple PMEM regions
- * in list pmem_raw_regions.
+ * in list pmem_raw_regions, and not appear in list pmem_data_regions.
  */
 static LIST_HEAD(pmem_mgmt_regions);
 static DEFINE_SPINLOCK(pmem_mgmt_lock);
 static unsigned int nr_mgmt_regions;
 
+/*
+ * All PMEM regions that can be mapped to guest are linked to this
+ * list. All of them must be covered by one or multiple PMEM regions
+ * in list pmem_raw_regions, and not appear in list pmem_mgmt_regions.
+ */
+static LIST_HEAD(pmem_data_regions);
+static DEFINE_SPINLOCK(pmem_data_lock);
+static unsigned int nr_data_regions;
+
 struct pmem {
 struct list_head link; /* link to one of PMEM region list */
 unsigned long smfn;/* start MFN of the PMEM region */
 unsigned long emfn;/* end MFN of the PMEM region */
+spinlock_t lock;
 
 union {
 struct {
@@ -53,6 +63,11 @@ struct pmem {
 struct {
 unsigned long used; /* # of used pages in MGMT PMEM region */
 } mgmt;
+
+struct {
+unsigned long mgmt_smfn; /* start MFN of management region */
+unsigned long mgmt_emfn; /* end MFN of management region */
+} data;
 } u;
 };
 
@@ -111,6 +126,7 @@ static int pmem_list_add(struct list_head *list,
 }
 new_pmem->smfn = smfn;
 new_pmem->emfn = emfn;
+spin_lock_init(_pmem->lock);
 list_add(_pmem->link, cur);
 
  out:
@@ -261,9 +277,16 @@ static int 
pmem_get_regions(xen_sysctl_nvdimm_pmem_regions_t *regions)
 
 static bool check_mgmt_size(unsigned long mgmt_mfns, unsigned long total_mfns)
 {
-return mgmt_mfns >=
+unsigned long required =
 ((sizeof(struct page_info) * total_mfns) >> PAGE_SHIFT) +
 ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
+
+if ( required > mgmt_mfns )
+printk(XEN

[Xen-devel] [RFC XEN PATCH v3 17/39] tools/xen-ndctl: add command 'setup-mgmt'

2017-09-10 Thread Haozhong Zhang
This command is to query Xen hypervisor to setup the specified PMEM
range for the management usage.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/misc/xen-ndctl.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index 6277a1eda2..1289a83dbe 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -36,6 +36,7 @@ static xc_interface *xch;
 static int handle_help(int argc, char *argv[]);
 static int handle_list(int argc, char *argv[]);
 static int handle_list_cmds(int argc, char *argv[]);
+static int handle_setup_mgmt(int argc, char *argv[]);
 
 static const struct xen_ndctl_cmd
 {
@@ -69,6 +70,14 @@ static const struct xen_ndctl_cmd
 .help= "List all supported commands.\n",
 .handler = handle_list_cmds,
 },
+
+{
+.name= "setup-mgmt",
+.syntax  = " ",
+.help= "Setup a PMEM region from MFN 'smfn' to 'emfn' for 
management usage.\n\n",
+.handler = handle_setup_mgmt,
+.need_xc = true,
+},
 };
 
 static const unsigned int nr_cmds = sizeof(cmds) / sizeof(cmds[0]);
@@ -197,6 +206,42 @@ static int handle_list_cmds(int argc, char *argv[])
 return 0;
 }
 
+static bool string_to_mfn(const char *str, unsigned long *ret)
+{
+unsigned long l;
+
+errno = 0;
+l = strtoul(str, NULL, 0);
+
+if ( !errno )
+*ret = l;
+else
+fprintf(stderr, "Invalid MFN %s: %s\n", str, strerror(errno));
+
+return !errno;
+}
+
+static int handle_setup_mgmt(int argc, char **argv)
+{
+unsigned long smfn, emfn;
+
+if ( argc < 3 )
+{
+fprintf(stderr, "Too few arguments.\n\n");
+show_help(argv[0]);
+return -EINVAL;
+}
+
+if ( !string_to_mfn(argv[1], ) ||
+ !string_to_mfn(argv[2], ) )
+return -EINVAL;
+
+if ( argc > 3 )
+return handle_unrecognized_argument(argv[0], argv[3]);
+
+return xc_nvdimm_pmem_setup_mgmt(xch, smfn, emfn);
+}
+
 int main(int argc, char *argv[])
 {
 unsigned int i;
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains

2017-09-10 Thread Haozhong Zhang
Overview
==

(RFC v2 can be found at 
https://lists.xen.org/archives/html/xen-devel/2017-03/msg02401.html)

Well, this RFC v3 changes and inflates a lot from previous versions.
The primary changes are listed below, most of which are to simplify
the first implementation and avoid additional inflation.

1. Drop the support to maintain the frametable and M2P table of PMEM
   in RAM. In the future, we may add this support back.

2. Hide host NFIT and deny access to host PMEM from Dom0. In other
   words, the kernel NVDIMM driver is loaded in Dom 0 and existing
   management utilities (e.g. ndctl) do not work in Dom0 anymore. This
   is to workaround the inferences of PMEM access between Dom0 and Xen
   hypervisor. In the future, we may add a stub driver in Dom0 which
   will hold the PMEM pages being used by Xen hypervisor and/or other
   domains.

3. As there is no NVDIMM driver and management utilities in Dom0 now,
   we cannot easily specify an area of host NVDIMM (e.g., by /dev/pmem0)
   and manage NVDIMM in Dom0 (e.g., creating labels).  Instead, we
   have to specify the exact MFNs of host PMEM pages in xl domain
   configuration files and the newly added Xen NVDIMM management
   utility xen-ndctl.

   If there are indeed some tasks that have to be handled by existing
   driver and management utilities, such as recovery from hardware
   failures, they have to be accomplished out of Xen environment.

   After 2. is solved in the future, we would be able to make existing
   driver and management utilities work in Dom0 again.

All patches can be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3


How to Test
==

1. Build and install this patchset with the associated QEMU patches.

2. Use xen-ndctl to get a list of PMEM regions detected by Xen
   hypervisor, e.g.
   
 # xen-ndctl list --raw
 Raw PMEM regions:
  0: MFN 0x48 - 0x88, PXM 3

   which indicates a PMEM region is present at MFN 0x48 - 0x88.

3. Setup a management area to manage the guest data areas.

 # xen-ndctl setup-mgmt 0x48 0x4c
 # xen-ndctl list --mgmt
 Management PMEM regions:
  0: MFN 0x48 - 0x4c, used 0xc00
 
   The first command setup the PMEM area in MFN 0x48 - 0x4c
   (1GB) as a management area, which is also used to manage itself.
   The second command list all management areas, and 'used' field
   shows the number of pages has been used from the beginning of that
   area.

   The size ratio between a management area and areas that it manages
   (including itself) should be at least 1 : 100 (i.e., 32 bytes for
   frametable and 8 bytes for M2P table per page).

   The size of a management area as well as a data area below is
   currently restricted to 256 Mbytes or multiples. The alignment is
   restricted to 2 Mbytes or multiples.

4. Setup a data area that can be used by guest.

 # xen-ndctl setup-data 0x4c 0x88 0x480c00 0x4c
 # xen-ndctl list --data
 Data PMEM regions:
  0: MFN 0x4c - 0x88, MGMT MFN 0x480c00 - 0x48b000

   The first command setup the remaining PMEM pages from MFN 0x4c
   to 0x88 as a data area. The management area MFN from 0x480c00
   to 0x4c is specified to manage this data area. The actual used
   management pages can be found by the second command.

5. Assign a data pages to a HVM domain by adding the following line in
   the domain configuration.

 vnvdimms = [ 'type=mfn, backend=0x4c, nr_pages=0x10' ]

   which assigns 4 Gbytes PMEM starting from MFN 0x4c to that
   domain. A 4 Gbytes PMEM should be present in guest (e.g., as
   /dev/pmem0) after above steps of setup.

   There can be one or multiple entries in vnvdimms, which do not
   overlap with each other. Sharing the PMEM pages between domains are
   not supported, so PMEM pages assigned to each domain should not
   overlap with each other.


Patch Organization
==

This RFC v3 is composed of following 6 parts per the task they are
going to solve. The tool stack patches are collected and separated
into each part.

- Part 0. Bug fix and code cleanup
[01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()
[02/39] x86_64/mm: drop redundant MFN to page conventions in 
cleanup_frame_table()
[03/39] x86_64/mm: avoid cleaning the unmapped frame table

- Part 1. Detect host PMEM
  Detect host PMEM via NFIT. No frametable and M2P table for them are
  created in this part.

[04/39] xen/common: add Kconfig item for pmem support
[05/39] x86/mm: exclude PMEM regions from initial frametable
[06/39] acpi: probe valid PMEM regions via NFIT
[07/39] xen/pmem: register valid PMEM regions to Xen hypervisor
[08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0
[09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
[10/39] xen/pmem: add 

[Xen-devel] [RFC XEN PATCH v3 09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op

2017-09-10 Thread Haozhong Zhang
XEN_SYSCTL_nvdimm_op will support a set of sub-commands to manage the
physical NVDIMM devices. This commit just adds the framework for this
hypercall, and does not implement any sub-commands.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Daniel De Graaf <dgde...@tycho.nsa.gov>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
---
 tools/flask/policy/modules/dom0.te  |  2 +-
 xen/common/pmem.c   | 18 ++
 xen/common/sysctl.c |  9 +
 xen/include/public/sysctl.h | 19 ++-
 xen/include/xen/pmem.h  |  2 ++
 xen/xsm/flask/hooks.c   |  4 
 xen/xsm/flask/policy/access_vectors |  2 ++
 7 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/modules/dom0.te 
b/tools/flask/policy/modules/dom0.te
index 338caaf41e..8a817b0b55 100644
--- a/tools/flask/policy/modules/dom0.te
+++ b/tools/flask/policy/modules/dom0.te
@@ -16,7 +16,7 @@ allow dom0_t xen_t:xen {
 allow dom0_t xen_t:xen2 {
resource_op psr_cmt_op psr_cat_op pmu_ctrl get_symbol
get_cpu_levelling_caps get_cpu_featureset livepatch_op
-   gcov_op set_parameter
+   gcov_op set_parameter nvdimm_op
 };
 
 # Allow dom0 to use all XENVER_ subops that have checks.
diff --git a/xen/common/pmem.c b/xen/common/pmem.c
index c9f5f6e904..d67f237cd5 100644
--- a/xen/common/pmem.c
+++ b/xen/common/pmem.c
@@ -131,6 +131,24 @@ int pmem_register(unsigned long smfn, unsigned long emfn, 
unsigned int pxm)
 return rc;
 }
 
+/**
+ * Top-level hypercall handler of XEN_SYSCTL_nvdimm_pmem_*.
+ *
+ * Parameters:
+ *  nvdimm: the hypercall parameters
+ *
+ * Return:
+ *  On success, return 0. Otherwise, return a non-zero error code.
+ */
+int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm)
+{
+int rc = -ENOSYS;
+
+nvdimm->err = -rc;
+
+return rc;
+}
+
 #ifdef CONFIG_X86
 
 int __init pmem_dom0_setup_permission(struct domain *d)
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index a6882d1c9d..33c8fca081 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -503,6 +504,14 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) 
u_sysctl)
 break;
 }
 
+#ifdef CONFIG_NVDIMM_PMEM
+case XEN_SYSCTL_nvdimm_op:
+ret = pmem_do_sysctl(>u.nvdimm);
+if ( ret != -ENOSYS )
+copyback = 1;
+break;
+#endif
+
 default:
 ret = arch_do_sysctl(op, u_sysctl);
 copyback = 0;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 7830b987da..e8272ae968 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -36,7 +36,7 @@
 #include "physdev.h"
 #include "tmem.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x000F
+#define XEN_SYSCTL_INTERFACE_VERSION 0x0010
 
 /*
  * Read console content from Xen buffer ring.
@@ -1114,6 +1114,21 @@ struct xen_sysctl_set_parameter {
 typedef struct xen_sysctl_set_parameter xen_sysctl_set_parameter_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_set_parameter_t);
 
+/*
+ * Interface for NVDIMM management.
+ */
+
+struct xen_sysctl_nvdimm_op {
+uint32_t cmd; /* IN: XEN_SYSCTL_nvdimm_*; none is implemented yet. */
+uint32_t pad; /* IN: Always zero. */
+union {
+/* Parameters of XEN_SYSCTL_nvdimm_* will be added here. */
+} u;
+uint32_t err; /* OUT: error code */
+};
+typedef struct xen_sysctl_nvdimm_op xen_sysctl_nvdimm_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_nvdimm_op_t);
+
 struct xen_sysctl {
 uint32_t cmd;
 #define XEN_SYSCTL_readconsole1
@@ -1143,6 +1158,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_get_cpu_featureset26
 #define XEN_SYSCTL_livepatch_op  27
 #define XEN_SYSCTL_set_parameter 28
+#define XEN_SYSCTL_nvdimm_op 29
 uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
 union {
 struct xen_sysctl_readconsole   readconsole;
@@ -1172,6 +1188,7 @@ struct xen_sysctl {
 struct xen_sysctl_cpu_featuresetcpu_featureset;
 struct xen_sysctl_livepatch_op  livepatch;
 struct xen_sysctl_set_parameter set_parameter;
+struct xen_sysctl_nvdimm_op nvdimm;
 uint8_t pad[128];
 } u;
 };
diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
index d5bd54ff19..922b12f570 100644
--- a/xen/include/xen/pmem.h
+++ b/xen/include/xen/pmem.h
@@ -20,9 +20,11 @@
 #define __XEN_PMEM_H__
 #ifdef CONFIG_NVDIMM_PMEM
 
+#include 
 #include 
 
 int pmem_register(unsigned long smfn, unsigned long emfn, unsigned int pxm);
+int pmem_do_sysctl(struct xen_sysctl_nvdimm_op *nvdimm);
 
 #ifdef CONFIG_X86
 
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm

[Xen-devel] [RFC XEN PATCH v3 14/39] x86_64/mm: refactor memory_add()

2017-09-10 Thread Haozhong Zhang
Separate the revertible part of memory_add_common(), which will also
be used in PMEM management. The separation will ease the failure
recovery in PMEM management. Several coding-style issues in the
touched code are fixed as well.

No functional change is introduced.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/x86_64/mm.c | 98 +++-
 1 file changed, 56 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index f635e4bf70..c8ffafe8a8 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1337,21 +1337,16 @@ static int mem_hotadd_check(unsigned long spfn, 
unsigned long epfn)
 return 1;
 }
 
-/*
- * A bit paranoid for memory allocation failure issue since
- * it may be reason for memory add
- */
-int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
+static int memory_add_common(struct mem_hotadd_info *info,
+ unsigned int pxm, bool direct_map)
 {
-struct mem_hotadd_info info;
+unsigned long spfn = info->spfn, epfn = info->epfn;
 int ret;
 nodeid_t node;
 unsigned long old_max = max_page, old_total = total_pages;
 unsigned long old_node_start, old_node_span, orig_online;
 unsigned long i;
 
-dprintk(XENLOG_INFO, "memory_add %lx ~ %lx with pxm %x\n", spfn, epfn, 
pxm);
-
 if ( !mem_hotadd_check(spfn, epfn) )
 return -EINVAL;
 
@@ -1366,22 +1361,25 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
unsigned int pxm)
 return -EINVAL;
 }
 
-i = virt_to_mfn(HYPERVISOR_VIRT_END - 1) + 1;
-if ( spfn < i )
-{
-ret = map_pages_to_xen((unsigned long)mfn_to_virt(spfn), spfn,
-   min(epfn, i) - spfn, PAGE_HYPERVISOR);
-if ( ret )
-goto destroy_directmap;
-}
-if ( i < epfn )
+if ( direct_map )
 {
-if ( i < spfn )
-i = spfn;
-ret = map_pages_to_xen((unsigned long)mfn_to_virt(i), i,
-   epfn - i, __PAGE_HYPERVISOR_RW);
-if ( ret )
-goto destroy_directmap;
+i = virt_to_mfn(HYPERVISOR_VIRT_END - 1) + 1;
+if ( spfn < i )
+{
+ret = map_pages_to_xen((unsigned long)mfn_to_virt(spfn), spfn,
+   min(epfn, i) - spfn, PAGE_HYPERVISOR);
+if ( ret )
+goto destroy_directmap;
+}
+if ( i < epfn )
+{
+if ( i < spfn )
+i = spfn;
+ret = map_pages_to_xen((unsigned long)mfn_to_virt(i), i,
+   epfn - i, __PAGE_HYPERVISOR_RW);
+if ( ret )
+goto destroy_directmap;
+}
 }
 
 old_node_start = node_start_pfn(node);
@@ -1398,22 +1396,18 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
unsigned int pxm)
 }
 else
 {
-if (node_start_pfn(node) > spfn)
+if ( node_start_pfn(node) > spfn )
 NODE_DATA(node)->node_start_pfn = spfn;
-if (node_end_pfn(node) < epfn)
+if ( node_end_pfn(node) < epfn )
 NODE_DATA(node)->node_spanned_pages = epfn - node_start_pfn(node);
 }
 
-info.spfn = spfn;
-info.epfn = epfn;
-info.cur = spfn;
-
-ret = extend_frame_table();
+ret = extend_frame_table(info);
 if ( ret )
 goto restore_node_status;
 
 /* Set max_page as setup_m2p_table will use it*/
-if (max_page < epfn)
+if ( max_page < epfn )
 {
 max_page = epfn;
 max_pdx = pfn_to_pdx(max_page - 1) + 1;
@@ -1421,7 +1415,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
unsigned int pxm)
 total_pages += epfn - spfn;
 
 set_pdx_range(spfn, epfn);
-ret = setup_m2p_table();
+ret = setup_m2p_table(info);
 
 if ( ret )
 goto destroy_m2p;
@@ -1429,11 +1423,12 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
unsigned int pxm)
 if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
 {
 for ( i = spfn; i < epfn; i++ )
-if ( iommu_map_page(hardware_domain, i, i, 
IOMMUF_readable|IOMMUF_writable) )
+if ( iommu_map_page(hardware_domain, i, i,
+IOMMUF_readable|IOMMUF_writable) )
 break;
 if ( i != epfn )
 {
-while (i-- > old_max)
+while ( i-- > old_max )
 /* If statement to satisfy __must_check. */
 if ( iommu_unmap_page(hardware_domain, i) )
 continue;
@@ -1442,14 +1437,10 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
unsigned int pxm)
 }
 }
 
-/* We c

[Xen-devel] [RFC XEN PATCH v3 06/39] acpi: probe valid PMEM regions via NFIT

2017-09-10 Thread Haozhong Zhang
A PMEM region with failures (e.g., not properly flushed in the last
power cycle, or some blocks within it are borken) cannot be safely
used by Xen and guest. Scan the state flags of NVDIMM region mapping
structures in NFIT to check whether any failures happened to a PMEM
region. The recovery of those failure are left out of Xen (e.g. left
to the firmware or other management utilities on the bare metal).

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/acpi/boot.c  |   4 ++
 xen/drivers/acpi/nfit.c   | 153 +-
 xen/include/acpi/actbl1.h |  26 
 xen/include/xen/acpi.h|   1 +
 4 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/acpi/boot.c b/xen/arch/x86/acpi/boot.c
index 8e6c96dcf6..f52a2c6dc5 100644
--- a/xen/arch/x86/acpi/boot.c
+++ b/xen/arch/x86/acpi/boot.c
@@ -732,5 +732,9 @@ int __init acpi_boot_init(void)
 
acpi_table_parse(ACPI_SIG_BGRT, acpi_invalidate_bgrt);
 
+#ifdef CONFIG_NVDIMM_PMEM
+   acpi_nfit_init();
+#endif
+
return 0;
 }
diff --git a/xen/drivers/acpi/nfit.c b/xen/drivers/acpi/nfit.c
index e099378ee0..b88a587b8d 100644
--- a/xen/drivers/acpi/nfit.c
+++ b/xen/drivers/acpi/nfit.c
@@ -31,11 +31,143 @@ static const uint8_t nfit_spa_pmem_guid[] =
 0xac, 0x43, 0x0d, 0x33, 0x18, 0xb7, 0x8c, 0xdb,
 };
 
+struct nfit_spa_desc {
+struct list_head link;
+struct acpi_nfit_system_address *acpi_table;
+};
+
+struct nfit_memdev_desc {
+struct list_head link;
+struct acpi_nfit_memory_map *acpi_table;
+struct nfit_spa_desc *spa_desc;
+};
+
 struct acpi_nfit_desc {
 struct acpi_table_nfit *acpi_table;
+struct list_head spa_list;
+struct list_head memdev_list;
 };
 
-static struct acpi_nfit_desc nfit_desc;
+static struct acpi_nfit_desc nfit_desc = {
+.spa_list = LIST_HEAD_INIT(nfit_desc.spa_list),
+.memdev_list = LIST_HEAD_INIT(nfit_desc.memdev_list),
+};
+
+static void __init acpi_nfit_del_subtables(struct acpi_nfit_desc *desc)
+{
+struct nfit_spa_desc *spa, *spa_next;
+struct nfit_memdev_desc *memdev, *memdev_next;
+
+list_for_each_entry_safe(spa, spa_next, >spa_list, link)
+{
+list_del(>link);
+xfree(spa);
+}
+list_for_each_entry_safe (memdev, memdev_next, >memdev_list, link)
+{
+list_del(>link);
+xfree(memdev);
+}
+}
+
+static int __init acpi_nfit_add_subtables(struct acpi_nfit_desc *desc)
+{
+struct acpi_table_nfit *nfit_table = desc->acpi_table;
+uint32_t hdr_offset = sizeof(*nfit_table);
+uint32_t nfit_length = nfit_table->header.length;
+struct acpi_nfit_header *hdr;
+struct nfit_spa_desc *spa_desc;
+struct nfit_memdev_desc *memdev_desc;
+int ret = 0;
+
+#define INIT_DESC(desc, acpi_hdr, acpi_type, desc_list) \
+do {\
+(desc) = xzalloc(typeof(*(desc)));  \
+if ( unlikely(!(desc)) ) {  \
+ret = -ENOMEM;  \
+goto nomem; \
+}   \
+(desc)->acpi_table = (acpi_type *)(acpi_hdr);   \
+INIT_LIST_HEAD(&(desc)->link);  \
+list_add_tail(&(desc)->link, (desc_list));  \
+} while ( 0 )
+
+while ( hdr_offset < nfit_length )
+{
+hdr = (void *)nfit_table + hdr_offset;
+hdr_offset += hdr->length;
+
+switch ( hdr->type )
+{
+case ACPI_NFIT_TYPE_SYSTEM_ADDRESS:
+INIT_DESC(spa_desc, hdr, struct acpi_nfit_system_address,
+  >spa_list);
+break;
+
+case ACPI_NFIT_TYPE_MEMORY_MAP:
+INIT_DESC(memdev_desc, hdr, struct acpi_nfit_memory_map,
+  >memdev_list);
+break;
+
+default:
+continue;
+}
+}
+
+#undef INIT_DESC
+
+return 0;
+
+ nomem:
+acpi_nfit_del_subtables(desc);
+
+return ret;
+}
+
+static void __init acpi_nfit_link_subtables(struct acpi_nfit_desc *desc)
+{
+struct nfit_spa_desc *spa_desc;
+struct nfit_memdev_desc *memdev_desc;
+uint16_t spa_idx;
+
+list_for_each_entry(memdev_desc, >memdev_list, link)
+{
+spa_idx = memdev_desc->acpi_table->range_index;
+list_for_each_entry(spa_desc, >spa_list, link)
+{
+if ( spa_desc->acpi_table->range_index == spa_idx )
+break;
+}
+memdev_desc->spa_desc = spa_desc;
+}
+}
+
+static void __init acpi_nfit_register_pmem(struct acpi_nfit_desc *desc)
+{
+struct nfit_spa_desc *spa_desc;
+struct nfit_memdev_desc *memdev_desc;
+struct acpi_nfit_system_address *spa;
+unsigned long smfn, e

[Xen-devel] [RFC XEN PATCH v3 13/39] tools/xen-ndctl: add command 'list'

2017-09-10 Thread Haozhong Zhang
Two options are supported by command 'list'. '--raw' indicates to list
all PMEM regions detected by Xen hypervisor, which can be later
configured for future usages. '--all' indicates all other
options (i.e. --raw and future options).

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/misc/xen-ndctl.c | 75 ++
 1 file changed, 75 insertions(+)

diff --git a/tools/misc/xen-ndctl.c b/tools/misc/xen-ndctl.c
index de40e29ff6..6277a1eda2 100644
--- a/tools/misc/xen-ndctl.c
+++ b/tools/misc/xen-ndctl.c
@@ -27,12 +27,14 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 static xc_interface *xch;
 
 static int handle_help(int argc, char *argv[]);
+static int handle_list(int argc, char *argv[]);
 static int handle_list_cmds(int argc, char *argv[]);
 
 static const struct xen_ndctl_cmd
@@ -52,6 +54,15 @@ static const struct xen_ndctl_cmd
 .handler = handle_help,
 },
 
+{
+.name= "list",
+.syntax  = "[--all | --raw ]",
+.help= "--all: the default option, list all PMEM regions of 
following types.\n"
+   "--raw: list all PMEM regions detected by Xen 
hypervisor.\n",
+.handler = handle_list,
+.need_xc = true,
+},
+
 {
 .name= "list-cmds",
 .syntax  = "",
@@ -109,6 +120,70 @@ static int handle_help(int argc, char *argv[])
 return 0;
 }
 
+static int handle_list_raw(void)
+{
+int rc;
+unsigned int nr = 0, i;
+xen_sysctl_nvdimm_pmem_raw_region_t *raw_list;
+
+rc = xc_nvdimm_pmem_get_regions_nr(xch, PMEM_REGION_TYPE_RAW, );
+if ( rc )
+{
+fprintf(stderr, "Cannot get the number of PMEM regions: %s.\n",
+strerror(-rc));
+return rc;
+}
+
+raw_list = malloc(nr * sizeof(*raw_list));
+if ( !raw_list )
+return -ENOMEM;
+
+rc = xc_nvdimm_pmem_get_regions(xch, PMEM_REGION_TYPE_RAW, raw_list, );
+if ( rc )
+goto out;
+
+printf("Raw PMEM regions:\n");
+for ( i = 0; i < nr; i++ )
+printf(" %u: MFN 0x%lx - 0x%lx, PXM %u\n",
+   i, raw_list[i].smfn, raw_list[i].emfn, raw_list[i].pxm);
+
+ out:
+free(raw_list);
+
+return rc;
+}
+
+static const struct list_handlers {
+const char *option;
+int (*handler)(void);
+} list_hndrs[] =
+{
+{ "--raw", handle_list_raw },
+};
+
+static const unsigned int nr_list_hndrs =
+sizeof(list_hndrs) / sizeof(list_hndrs[0]);
+
+static int handle_list(int argc, char *argv[])
+{
+bool list_all = argc <= 1 || !strcmp(argv[1], "--all");
+unsigned int i;
+bool handled = false;
+int rc = 0;
+
+for ( i = 0; i < nr_list_hndrs && !rc; i++)
+if ( list_all || !strcmp(argv[1], list_hndrs[i].option) )
+{
+rc = list_hndrs[i].handler();
+handled = true;
+}
+
+if ( !handled )
+return handle_unrecognized_argument(argv[0], argv[1]);
+
+return rc;
+}
+
 static int handle_list_cmds(int argc, char *argv[])
 {
 unsigned int i;
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Graphical virtualization in intel® Atom is possible?

2017-08-17 Thread Haozhong Zhang
+Hongbo Wang from Intel GPU virtualization team

On 08/17/17 06:36 +, Asharaf Perinchikkal wrote:
> Hi All,
> 
> We are trying to do graphical virtualization in intel® Atom™ 
> E3845(MinnowBoard Turbot Quad-Core board) using xen.
> 
> Is it possible to do graphical virtualization in intel® Atom?
> 
> If yes,Could you please suggest what are versions of xen and linux 
> recommended to use and steps i need to follow?
> 
> Regards
> Asharaf P
> ---Disclaimer-- This e-mail contains PRIVILEGED 
> AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). 
> If you are not the intended recipient, please notify the sender by e-mail and 
> delete the original message. Opinions, conclusions and other information in 
> this transmission that do not relate to the official business of QuEST Global 
> and/or its subsidiaries, shall be understood as neither given nor endorsed by 
> it. Any statements made herein that are tantamount to contractual 
> obligations, promises, claims or commitments shall not be binding on the 
> Company unless followed by written confirmation by an authorized signatory of 
> the Company. 
> ---

> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Building XenGT for Intel embedded board

2017-08-17 Thread Haozhong Zhang
+Hongbo Wang from Intel GPU virtualization team

On 08/10/17 22:47 +, Monisha Barooah wrote:
> Hi Everyone,
> I am currently exploring on bringing up XenGT for an Intel embedded board.
> 
> I came across this document relating to bringing up XenGT for the Sandy 
> Bridge/Ivy Bridge/Haswell platform
> https://www.intel.com/content/dam/www/public/us/en/documents/guides/xgengt-for-ivi-solutions-dev-kit-getting-started-guide.pdf
> 
> Our current Intel embedded board is up with an Yocto image integrated with 
> the Intel BSP for the board. The board uses ABL boot loader.
> 
> I saw in the XenGT document for the Sandy Bridge/Ivy Bridge/Haswell platform, 
> that there is mention of Qemu alone and no mention of any Intel BSPs. Don't 
> we require Intel BSP for dom0 kernel to work in the XenGT hypervisor? Or is a 
> generic version of Intel BSP integrated with the kernel image link 
> https://github.com/01org/XenGT-Preview-kernel.git.
> 
> Also, as we have an Yocto image in the Intel board, we might have to cross 
> compile the Kernel, Xen and Qemu builds as mentioned in the link above for 
> our Intel embedded board using a Linaro toolchain. If not, is there a way, we 
> can link this particular version of XenGT directly with our Yocto image for 
> the Intel board by including the meta-virtualization layer as mentioned in 
> the link http://git.yoctoproject.org/cgit/cgit.cgi/meta-virtualization/about/ 
> and doing 'bitbake xen image minimal'?
> 
> Please advise which is the correct route to take in this regard.
> 
> Thanks
> M
> 
> 
> 
> __ 
> CONFIDENTIALITY NOTE: This electronic message (including any attachments) may 
> contain information that is privileged, confidential, and proprietary. If you 
> are not the intended recipient, you are hereby notified that any disclosure, 
> copying, distribution, or use of the information contained herein (including 
> any reliance thereon) is strictly prohibited. If you received this electronic 
> message in error, please immediately reply to the sender that you have 
> received this communication and destroy the material in its entirety, whether 
> in electronic or hard copy format. Although Rivian Automotive Inc. has taken 
> reasonable precautions to ensure no viruses are present in this email, Rivian 
> accepts no responsibility for any loss or damage arising from the use of this 
> email or attachments.

> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Is possible to do GPU virtualization in Intel® Atom?

2017-08-02 Thread Haozhong Zhang
+Hongbo from Intel GPU virtualization team

On 08/02/17 09:41 +, Asharaf Perinchikkal wrote:
> Is possible to achieve GPU virtualization in Intel® Atom using para 
> virtualization?
> 
> From: Roger Pau Monné [roger@citrix.com]
> Sent: Wednesday, August 02, 2017 1:04 PM
> To: Asharaf Perinchikkal
> Cc: xen-devel@lists.xen.org; Anoop Babu
> Subject: Re: [Xen-devel] Is possible to do GPU virtualization in Intel® Atom?
> 
> On Tue, Aug 01, 2017 at 10:01:01AM +, Asharaf Perinchikkal wrote:
> > Hi All,
> >
> >
> > In Intel® Atom™ E3845(MinnowBoard Turbot Quad-Core board) has only  support 
> > for Virtualization Technology (VT-x).
> >
> > No support for Intel® Virtualization Technology for Directed I/O (VT-d). 
> > [https://ark.intel.com/products/78475/Intel-Atom-Processor-E3845-2M-Cache-1_91-GHz]
> 
> Without VT-d (IOMMU) you won't be able to passthrough any physical
> device to a guest, so no, you won't be able to do GPU passthrough (at
> least in a safe way).
> 
> Roger.
> ---Disclaimer-- This e-mail contains PRIVILEGED 
> AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). 
> If you are not the intended recipient, please notify the sender by e-mail and 
> delete the original message. Opinions, conclusions and other information in 
> this transmission that do not relate to the official business of QuEST Global 
> and/or its subsidiaries, shall be understood as neither given nor endorsed by 
> it. Any statements made herein that are tantamount to contractual 
> obligations, promises, claims or commitments shall not be binding on the 
> Company unless followed by written confirmation by an authorized signatory of 
> the Company. 
> ---
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 6/7] tools/libxc: add support of injecting MC# to specified CPUs

2017-07-13 Thread Haozhong Zhang
On 07/12/17 09:25 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jul 12, 2017 at 10:04:39AM +0800, Haozhong Zhang wrote:
> > Though XEN_MC_inject_v2 allows injecting MC# to specified CPUs, the
> > current xc_mca_op() does not use this feature and not provide an
> > interface to callers. This commit add a new xc_mca_op_inject_v2() that
> > receives a cpumap providing the set of target CPUs.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
> > Acked-by: Wei Liu <wei.l...@citrix.com>
> > ---
> > Cc: Ian Jackson <ian.jack...@eu.citrix.com>
> > Cc: Wei Liu <wei.l...@citrix.com>
> > ---
> >  tools/libxc/include/xenctrl.h |  2 ++
> >  tools/libxc/xc_misc.c | 52 
> > ++-
> >  2 files changed, 53 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> > index c51bb3b448..552a4fd47d 100644
> > --- a/tools/libxc/include/xenctrl.h
> > +++ b/tools/libxc/include/xenctrl.h
> > @@ -1809,6 +1809,8 @@ int xc_cpuid_apply_policy(xc_interface *xch,
> >  void xc_cpuid_to_str(const unsigned int *regs,
> >   char **strs); /* some strs[] may be NULL if ENOMEM */
> >  int xc_mca_op(xc_interface *xch, struct xen_mc *mc);
> > +int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
> > +xc_cpumap_t cpumap, unsigned int nr_cpus);
> >  #endif
> >  
> >  struct xc_px_val {
> > diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
> > index 88084fde30..2303293c6c 100644
> > --- a/tools/libxc/xc_misc.c
> > +++ b/tools/libxc/xc_misc.c
> > @@ -341,7 +341,57 @@ int xc_mca_op(xc_interface *xch, struct xen_mc *mc)
> >  xc_hypercall_bounce_post(xch, mc);
> >  return ret;
> >  }
> > -#endif
> > +
> > +int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
> > +xc_cpumap_t cpumap, unsigned int nr_bits)
> > +{
> > +int ret = -1;
> > +struct xen_mc mc_buf, *mc = _buf;
> > +struct xen_mc_inject_v2 *inject = >u.mc_inject_v2;
> > +
> > +DECLARE_HYPERCALL_BOUNCE(cpumap, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN);
> > +DECLARE_HYPERCALL_BOUNCE(mc, sizeof(*mc), 
> > XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> > +
> > +memset(mc, 0, sizeof(*mc));
> > +
> > +if ( cpumap )
> > +{
> > +if ( !nr_bits )
> > +{
> > +errno = EINVAL;
> > +goto out;
> > +}
> > +
> > +HYPERCALL_BOUNCE_SET_SIZE(cpumap, (nr_bits + 7) / 8);
> 
> bitmap_size ?

nr_bits is of type unsigned int, while bitmap_size() requires a signed
int argument, though the number of CPUs passed via nr_bits in practice
can be represented by a signed int.

Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 7/7] tools/xen-mceinj: add support of injecting LMCE

2017-07-12 Thread Haozhong Zhang
On 07/12/17 09:26 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jul 12, 2017 at 10:04:40AM +0800, Haozhong Zhang wrote:
> > If option '-l' or '--lmce' is specified and the host supports LMCE,
> > xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c'
> > is not present).
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
> > Acked-by: Wei Liu <wei.l...@citrix.com>
> > ---
> > Cc: Ian Jackson <ian.jack...@eu.citrix.com>
> > Cc: Wei Liu <wei.l...@citrix.com>
> > ---
> >  tools/tests/mce-test/tools/xen-mceinj.c | 50 
> > +++--
> >  1 file changed, 48 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/tests/mce-test/tools/xen-mceinj.c 
> > b/tools/tests/mce-test/tools/xen-mceinj.c
> > index bae5a46eb5..380e42190c 100644
> > --- a/tools/tests/mce-test/tools/xen-mceinj.c
> > +++ b/tools/tests/mce-test/tools/xen-mceinj.c
[..]
> >  
> > +static int inject_lmce(xc_interface *xc_handle, unsigned int cpu)
> > +{
> > +uint8_t *cpumap = NULL;
> > +size_t cpumap_size, line, shift;
> > +unsigned int nr_cpus;
> > +int ret;
> > +
> > +nr_cpus = mca_cpuinfo(xc_handle);
> > +if ( !nr_cpus )
> > +err(xc_handle, "Failed to get mca_cpuinfo");
> > +if ( cpu >= nr_cpus )
> > +err(xc_handle, "-c %u is larger than %u", cpu, nr_cpus - 1);
> > +
> > +cpumap_size = (nr_cpus + 7) / 8;
> 
> bitmap_size
>

IIUC, these bitmap_* functions/macros are libxc internals and should
not be used here.

Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 7/7] tools/xen-mceinj: add support of injecting LMCE

2017-07-11 Thread Haozhong Zhang
If option '-l' or '--lmce' is specified and the host supports LMCE,
xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c'
is not present).

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Acked-by: Wei Liu <wei.l...@citrix.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/tests/mce-test/tools/xen-mceinj.c | 50 +++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/tools/tests/mce-test/tools/xen-mceinj.c 
b/tools/tests/mce-test/tools/xen-mceinj.c
index bae5a46eb5..380e42190c 100644
--- a/tools/tests/mce-test/tools/xen-mceinj.c
+++ b/tools/tests/mce-test/tools/xen-mceinj.c
@@ -56,6 +56,8 @@
 #define MSR_IA32_MC0_MISC0x0403
 #define MSR_IA32_MC0_CTL20x0280
 
+#define MCG_STATUS_LMCE  0x8
+
 struct mce_info {
 const char *description;
 uint8_t mcg_stat;
@@ -113,6 +115,7 @@ static struct mce_info mce_table[] = {
 #define LOGFILE stdout
 
 int dump;
+int lmce;
 struct xen_mc_msrinject msr_inj;
 
 static void Lprintf(const char *fmt, ...)
@@ -212,6 +215,35 @@ static int inject_mce(xc_interface *xc_handle, int cpu_nr)
 return xc_mca_op(xc_handle, );
 }
 
+static int inject_lmce(xc_interface *xc_handle, unsigned int cpu)
+{
+uint8_t *cpumap = NULL;
+size_t cpumap_size, line, shift;
+unsigned int nr_cpus;
+int ret;
+
+nr_cpus = mca_cpuinfo(xc_handle);
+if ( !nr_cpus )
+err(xc_handle, "Failed to get mca_cpuinfo");
+if ( cpu >= nr_cpus )
+err(xc_handle, "-c %u is larger than %u", cpu, nr_cpus - 1);
+
+cpumap_size = (nr_cpus + 7) / 8;
+cpumap = malloc(cpumap_size);
+if ( !cpumap )
+err(xc_handle, "Failed to allocate cpumap\n");
+memset(cpumap, 0, cpumap_size);
+line = cpu / 8;
+shift = cpu % 8;
+memset(cpumap + line, 1 << shift, 1);
+
+ret = xc_mca_op_inject_v2(xc_handle, XEN_MC_INJECT_TYPE_LMCE,
+  cpumap, cpumap_size * 8);
+
+free(cpumap);
+return ret;
+}
+
 static uint64_t bank_addr(int bank, int type)
 {
 uint64_t addr;
@@ -330,8 +362,15 @@ static int inject(xc_interface *xc_handle, struct mce_info 
*mce,
   uint32_t cpu_nr, uint32_t domain, uint64_t gaddr)
 {
 int ret = 0;
+uint8_t mcg_status = mce->mcg_stat;
 
-ret = inject_mcg_status(xc_handle, cpu_nr, mce->mcg_stat, domain);
+if ( lmce )
+{
+if ( mce->cmci )
+err(xc_handle, "No support to inject CMCI as LMCE");
+mcg_status |= MCG_STATUS_LMCE;
+}
+ret = inject_mcg_status(xc_handle, cpu_nr, mcg_status, domain);
 if ( ret )
 err(xc_handle, "Failed to inject MCG_STATUS MSR");
 
@@ -354,6 +393,8 @@ static int inject(xc_interface *xc_handle, struct mce_info 
*mce,
 err(xc_handle, "Failed to inject MSR");
 if ( mce->cmci )
 ret = inject_cmci(xc_handle, cpu_nr);
+else if ( lmce )
+ret = inject_lmce(xc_handle, cpu_nr);
 else
 ret = inject_mce(xc_handle, cpu_nr);
 if ( ret )
@@ -393,6 +434,7 @@ static struct option opts[] = {
 {"dump", 0, 0, 'D'},
 {"help", 0, 0, 'h'},
 {"page", 0, 0, 'p'},
+{"lmce", 0, 0, 'l'},
 {"", 0, 0, '\0'}
 };
 
@@ -409,6 +451,7 @@ static void help(void)
"  -d, --domain=DOMID   target domain, the default is Xen itself\n"
"  -h, --help   print this page\n"
"  -p, --page=ADDR  physical address to report\n"
+   "  -l, --lmce   inject as LMCE (Intel only)\n"
"  -t, --type=ERROR error type\n");
 
 for ( i = 0; i < MCE_TABLE_SIZE; i++ )
@@ -438,7 +481,7 @@ int main(int argc, char *argv[])
 }
 
 while ( 1 ) {
-c = getopt_long(argc, argv, "c:Dd:t:hp:", opts, _index);
+c = getopt_long(argc, argv, "c:Dd:t:hp:l", opts, _index);
 if ( c == -1 )
 break;
 switch ( c ) {
@@ -463,6 +506,9 @@ int main(int argc, char *argv[])
 case 't':
 type = strtol(optarg, NULL, 0);
 break;
+case 'l':
+lmce = 1;
+break;
 case 'h':
 default:
 help();
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 5/7] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2

2017-07-11 Thread Haozhong Zhang
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mce.c | 24 +++-
 xen/include/public/arch-x86/xen-mca.h |  1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index ee04fb54ff..30525dd78b 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -1485,11 +1485,12 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc)
 {
 const cpumask_t *cpumap;
 cpumask_var_t cmv;
+bool broadcast = op->u.mc_inject_v2.flags & 
XEN_MC_INJECT_CPU_BROADCAST;
 
 if (nr_mce_banks == 0)
 return x86_mcerr("do_mca #MC", -ENODEV);
 
-if ( op->u.mc_inject_v2.flags & XEN_MC_INJECT_CPU_BROADCAST )
+if ( broadcast )
 cpumap = _online_map;
 else
 {
@@ -1529,6 +1530,27 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc)
 }
 break;
 
+case XEN_MC_INJECT_TYPE_LMCE:
+if ( !lmce_support )
+{
+ret = x86_mcerr("No LMCE support", -EINVAL);
+break;
+}
+if ( broadcast )
+{
+ret = x86_mcerr("Broadcast cannot be used with LMCE", -EINVAL);
+break;
+}
+/* Ensure at most one CPU is specified. */
+if ( nr_cpu_ids > cpumask_next(cpumask_first(cpumap), cpumap) )
+{
+ret = x86_mcerr("More than one CPU specified for LMCE",
+-EINVAL);
+break;
+}
+on_selected_cpus(cpumap, x86_mc_mceinject, NULL, 1);
+break;
+
 default:
 ret = x86_mcerr("Wrong mca type\n", -EINVAL);
 break;
diff --git a/xen/include/public/arch-x86/xen-mca.h 
b/xen/include/public/arch-x86/xen-mca.h
index 7db990723b..dc35267249 100644
--- a/xen/include/public/arch-x86/xen-mca.h
+++ b/xen/include/public/arch-x86/xen-mca.h
@@ -414,6 +414,7 @@ struct xen_mc_mceinject {
 #define XEN_MC_INJECT_TYPE_MASK 0x7
 #define XEN_MC_INJECT_TYPE_MCE  0x0
 #define XEN_MC_INJECT_TYPE_CMCI 0x1
+#define XEN_MC_INJECT_TYPE_LMCE 0x2
 
 #define XEN_MC_INJECT_CPU_BROADCAST 0x8
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 2/7] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL

2017-07-11 Thread Haozhong Zhang
If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, then allow guest
to read/write MSR_IA32_MCG_EXT_CTL.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/cpu/mcheck/vmce.c | 34 +-
 xen/arch/x86/domctl.c  |  2 ++
 xen/include/asm-x86/mce.h  |  1 +
 xen/include/public/arch-x86/hvm/save.h |  1 +
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 1356f611ab..060e2d0582 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -91,6 +91,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct 
hvm_vmce_vcpu *ctxt)
 v->arch.vmce.mcg_cap = ctxt->caps;
 v->arch.vmce.bank[0].mci_ctl2 = ctxt->mci_ctl2_bank0;
 v->arch.vmce.bank[1].mci_ctl2 = ctxt->mci_ctl2_bank1;
+v->arch.vmce.mcg_ext_ctl = ctxt->mcg_ext_ctl;
 
 return 0;
 }
@@ -200,6 +201,26 @@ int vmce_rdmsr(uint32_t msr, uint64_t *val)
 mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_CTL %#"PRIx64"\n", cur, 
*val);
 break;
 
+case MSR_IA32_MCG_EXT_CTL:
+/*
+ * If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, the LMCE and 
LOCK
+ * bits are always set in guest MSR_IA32_FEATURE_CONTROL by Xen, so it
+ * does not need to check them here.
+ */
+if ( cur->arch.vmce.mcg_cap & MCG_LMCE_P )
+{
+*val = cur->arch.vmce.mcg_ext_ctl;
+mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL %#"PRIx64"\n",
+   cur, *val);
+}
+else
+{
+ret = -1;
+mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL, not 
supported\n",
+   cur);
+}
+break;
+
 default:
 ret = mce_bank_msr(cur, msr) ? bank_mce_rdmsr(cur, msr, val) : 0;
 break;
@@ -309,6 +330,16 @@ int vmce_wrmsr(uint32_t msr, uint64_t val)
 mce_printk(MCE_VERBOSE, "MCE: %pv: MCG_CAP is r/o\n", cur);
 break;
 
+case MSR_IA32_MCG_EXT_CTL:
+if ( (cur->arch.vmce.mcg_cap & MCG_LMCE_P) &&
+ !(val & ~MCG_EXT_CTL_LMCE_EN) )
+cur->arch.vmce.mcg_ext_ctl = val;
+else
+ret = -1;
+mce_printk(MCE_VERBOSE, "MCE: %pv: wr MCG_EXT_CTL %"PRIx64"%s\n",
+   cur, val, (ret == -1) ? ", not supported" : "");
+break;
+
 default:
 ret = mce_bank_msr(cur, msr) ? bank_mce_wrmsr(cur, msr, val) : 0;
 break;
@@ -327,7 +358,8 @@ static int vmce_save_vcpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 struct hvm_vmce_vcpu ctxt = {
 .caps = v->arch.vmce.mcg_cap,
 .mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2,
-.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2
+.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2,
+.mcg_ext_ctl = v->arch.vmce.mcg_ext_ctl,
 };
 
 err = hvm_save_entry(VMCE_VCPU, v->vcpu_id, h, );
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 3637d32669..3628af2f70 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -315,6 +315,7 @@ static int vcpu_set_vmce(struct vcpu *v,
 
 static const unsigned int valid_sizes[] = {
 sizeof(evc->vmce),
+VMCE_SIZE(mci_ctl2_bank1),
 VMCE_SIZE(caps),
 };
 #undef VMCE_SIZE
@@ -908,6 +909,7 @@ long arch_do_domctl(
 evc->vmce.caps = v->arch.vmce.mcg_cap;
 evc->vmce.mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2;
 evc->vmce.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2;
+evc->vmce.mcg_ext_ctl = v->arch.vmce.mcg_ext_ctl;
 
 ret = 0;
 vcpu_unpause(v);
diff --git a/xen/include/asm-x86/mce.h b/xen/include/asm-x86/mce.h
index 56ad1f92dd..35f9962638 100644
--- a/xen/include/asm-x86/mce.h
+++ b/xen/include/asm-x86/mce.h
@@ -27,6 +27,7 @@ struct vmce_bank {
 struct vmce {
 uint64_t mcg_cap;
 uint64_t mcg_status;
+uint64_t mcg_ext_ctl;
 spinlock_t lock;
 struct vmce_bank bank[GUEST_MC_BANK_NUM];
 };
diff --git a/xen/include/public/arch-x86/hvm/save.h 
b/xen/include/public/arch-x86/hvm/save.h
index 816973b9c2..fd7bf3fb38 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -610,6 +610,7 @@ struct hvm_vmce_vcpu {
 uint64_t caps;
 uint64_t mci_ctl2_bank0;
 uint64_t mci_ctl2_bank1;
+uint64_t mcg_ext_ctl;
 };
 
 DECLARE_HVM_SAVE_TYPE(VMCE_VCPU, 18, struct hvm_vmce_vcpu);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 6/7] tools/libxc: add support of injecting MC# to specified CPUs

2017-07-11 Thread Haozhong Zhang
Though XEN_MC_inject_v2 allows injecting MC# to specified CPUs, the
current xc_mca_op() does not use this feature and not provide an
interface to callers. This commit add a new xc_mca_op_inject_v2() that
receives a cpumap providing the set of target CPUs.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Acked-by: Wei Liu <wei.l...@citrix.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/libxc/include/xenctrl.h |  2 ++
 tools/libxc/xc_misc.c | 52 ++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index c51bb3b448..552a4fd47d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1809,6 +1809,8 @@ int xc_cpuid_apply_policy(xc_interface *xch,
 void xc_cpuid_to_str(const unsigned int *regs,
  char **strs); /* some strs[] may be NULL if ENOMEM */
 int xc_mca_op(xc_interface *xch, struct xen_mc *mc);
+int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
+xc_cpumap_t cpumap, unsigned int nr_cpus);
 #endif
 
 struct xc_px_val {
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 88084fde30..2303293c6c 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -341,7 +341,57 @@ int xc_mca_op(xc_interface *xch, struct xen_mc *mc)
 xc_hypercall_bounce_post(xch, mc);
 return ret;
 }
-#endif
+
+int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
+xc_cpumap_t cpumap, unsigned int nr_bits)
+{
+int ret = -1;
+struct xen_mc mc_buf, *mc = _buf;
+struct xen_mc_inject_v2 *inject = >u.mc_inject_v2;
+
+DECLARE_HYPERCALL_BOUNCE(cpumap, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+DECLARE_HYPERCALL_BOUNCE(mc, sizeof(*mc), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+
+memset(mc, 0, sizeof(*mc));
+
+if ( cpumap )
+{
+if ( !nr_bits )
+{
+errno = EINVAL;
+goto out;
+}
+
+HYPERCALL_BOUNCE_SET_SIZE(cpumap, (nr_bits + 7) / 8);
+if ( xc_hypercall_bounce_pre(xch, cpumap) )
+{
+PERROR("Could not bounce cpumap memory buffer");
+goto out;
+}
+set_xen_guest_handle(inject->cpumap.bitmap, cpumap);
+inject->cpumap.nr_bits = nr_bits;
+}
+
+inject->flags = flags;
+mc->cmd = XEN_MC_inject_v2;
+mc->interface_version = XEN_MCA_INTERFACE_VERSION;
+
+if ( xc_hypercall_bounce_pre(xch, mc) )
+{
+PERROR("Could not bounce xen_mc memory buffer");
+goto out_free_cpumap;
+}
+
+ret = xencall1(xch->xcall, __HYPERVISOR_mca, HYPERCALL_BUFFER_AS_ARG(mc));
+
+xc_hypercall_bounce_post(xch, mc);
+out_free_cpumap:
+if ( cpumap )
+xc_hypercall_bounce_post(xch, cpumap);
+out:
+return ret;
+}
+#endif /* __i386__ || __x86_64__ */
 
 int xc_perfc_reset(xc_interface *xch)
 {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 1/7] x86/domctl: generalize the restore of vMCE parameters

2017-07-11 Thread Haozhong Zhang
vMCE parameters in struct xen_domctl_ext_vcpucontext were extended in
the past, and is likely to be extended in the future. When migrating a
PV domain from old Xen, XEN_DOMCTL_set_ext_vcpucontext should handle
the differences.

Instead of adding ad-hoc handling code at each extension, we introduce
an array to record sizes of the current and all past versions of vMCE
parameters, and search for the largest one that does not expire the
size of passed-in parameters to determine vMCE parameters that will be
restored. If vMCE parameters are extended in the future, we only need
to adapt the array to reflect the extension.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>

Changes in v9:
 * Rename "param" to "field" in macro VMCE_SIZE().
 * Use min(..., sizeof(evc->vmce)) to get the size of vMCE parameters.
---
 xen/arch/x86/domctl.c | 55 +++
 1 file changed, 38 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 7fa58b49af..3637d32669 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -302,6 +302,43 @@ static int update_domain_cpuid_info(struct domain *d,
 return 0;
 }
 
+static int vcpu_set_vmce(struct vcpu *v,
+ const struct xen_domctl_ext_vcpucontext *evc)
+{
+/*
+ * Sizes of vMCE parameters used by the current and past versions
+ * of Xen in descending order. If vMCE parameters are extended,
+ * remember to add the old size to this array by VMCE_SIZE().
+ */
+#define VMCE_SIZE(field) \
+(offsetof(typeof(evc->vmce), field) + sizeof(evc->vmce.field))
+
+static const unsigned int valid_sizes[] = {
+sizeof(evc->vmce),
+VMCE_SIZE(caps),
+};
+#undef VMCE_SIZE
+
+struct hvm_vmce_vcpu vmce = { };
+unsigned int evc_vmce_size =
+min(evc->size - offsetof(typeof(*evc), mcg_cap), sizeof(evc->vmce));
+unsigned int i = 0;
+
+BUILD_BUG_ON(offsetof(typeof(*evc), mcg_cap) !=
+ offsetof(typeof(*evc), vmce.caps));
+BUILD_BUG_ON(sizeof(evc->mcg_cap) != sizeof(evc->vmce.caps));
+
+while ( i < ARRAY_SIZE(valid_sizes) && evc_vmce_size < valid_sizes[i] )
+++i;
+
+if ( i == ARRAY_SIZE(valid_sizes) )
+return 0;
+
+memcpy(, >vmce, valid_sizes[i]);
+
+return vmce_restore_vcpu(v, );
+}
+
 void arch_get_domain_info(const struct domain *d,
   struct xen_domctl_getdomaininfo *info)
 {
@@ -912,23 +949,7 @@ long arch_do_domctl(
 else
 domain_pause(d);
 
-BUILD_BUG_ON(offsetof(struct xen_domctl_ext_vcpucontext,
-  mcg_cap) !=
- offsetof(struct xen_domctl_ext_vcpucontext,
-  vmce.caps));
-BUILD_BUG_ON(sizeof(evc->mcg_cap) != sizeof(evc->vmce.caps));
-if ( evc->size >= offsetof(typeof(*evc), vmce) +
-  sizeof(evc->vmce) )
-ret = vmce_restore_vcpu(v, >vmce);
-else if ( evc->size >= offsetof(typeof(*evc), mcg_cap) +
-   sizeof(evc->mcg_cap) )
-{
-struct hvm_vmce_vcpu vmce = { .caps = evc->mcg_cap };
-
-ret = vmce_restore_vcpu(v, );
-}
-else
-ret = 0;
+ret = vcpu_set_vmce(v, evc);
 
 domain_unpause(d);
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 0/7] Add LMCE support

2017-07-11 Thread Haozhong Zhang
Changes in v9:
 * Minor updates in patch 1 per Jan's comments.
 * Collect Jan's R-b in patch 2.

Haozhong Zhang (7):
  [M   ] x86/domctl: generalize the restore of vMCE parameters
  [  R ] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
  [  R ] x86/vmce: enable injecting LMCE to guest on Intel host
  [  RA] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
  [  R ] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
  [   A] tools/libxc: add support of injecting MC# to specified CPUs
  [   A] tools/xen-mceinj: add support of injecting LMCE

 N: new in this version
 M: modified in this version
 R: got R-b
 A: got A-b

 docs/man/xl.cfg.pod.5.in| 24 +
 tools/libxc/include/xenctrl.h   |  2 ++
 tools/libxc/xc_misc.c   | 52 ++-
 tools/libxc/xc_sr_save_x86_hvm.c|  1 +
 tools/libxl/libxl.h |  7 
 tools/libxl/libxl_dom.c | 15 
 tools/libxl/libxl_types.idl |  1 +
 tools/tests/mce-test/tools/xen-mceinj.c | 50 --
 tools/xl/xl_parse.c | 31 ++--
 xen/arch/x86/cpu/mcheck/mcaction.c  | 23 
 xen/arch/x86/cpu/mcheck/mce.c   | 24 -
 xen/arch/x86/cpu/mcheck/mce.h   |  1 +
 xen/arch/x86/cpu/mcheck/mce_intel.c |  2 +-
 xen/arch/x86/cpu/mcheck/vmce.c  | 64 +++--
 xen/arch/x86/cpu/mcheck/vmce.h  |  2 +-
 xen/arch/x86/domctl.c   | 57 -
 xen/arch/x86/hvm/hvm.c  |  5 +++
 xen/include/asm-x86/mce.h   |  2 ++
 xen/include/public/arch-x86/hvm/save.h  |  1 +
 xen/include/public/arch-x86/xen-mca.h   |  1 +
 xen/include/public/hvm/params.h |  7 +++-
 21 files changed, 336 insertions(+), 36 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 4/7] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP

2017-07-11 Thread Haozhong Zhang
If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present
in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP.
By default, LMCE is not exposed to guest so as to keep the backwards migration
compatibility.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com> for hypervisor side
Acked-by: Wei Liu <wei.l...@citrix.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 docs/man/xl.cfg.pod.5.in| 24 
 tools/libxc/xc_sr_save_x86_hvm.c|  1 +
 tools/libxl/libxl.h |  7 +++
 tools/libxl/libxl_dom.c | 15 +++
 tools/libxl/libxl_types.idl |  1 +
 tools/xl/xl_parse.c | 31 +--
 xen/arch/x86/cpu/mcheck/mce.h   |  1 +
 xen/arch/x86/cpu/mcheck/mce_intel.c |  2 +-
 xen/arch/x86/cpu/mcheck/vmce.c  | 19 ++-
 xen/arch/x86/hvm/hvm.c  |  5 +
 xen/include/asm-x86/mce.h   |  1 +
 xen/include/public/hvm/params.h |  7 ++-
 12 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index ff3203550f..79cb2eaea7 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -2173,6 +2173,30 @@ natively or via hardware backwards compatibility support.
 
 =back
 
+=head3 x86
+
+=over 4
+
+=item B

[Xen-devel] [PATCH v9 3/7] x86/vmce: enable injecting LMCE to guest on Intel host

2017-07-11 Thread Haozhong Zhang
Inject LMCE to guest if the host MCE is LMCE and the affected vcpu is
known. Otherwise, broadcast MCE to all vcpus on Intel host.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mcaction.c | 23 ---
 xen/arch/x86/cpu/mcheck/vmce.c | 11 ++-
 xen/arch/x86/cpu/mcheck/vmce.h |  2 +-
 3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c 
b/xen/arch/x86/cpu/mcheck/mcaction.c
index ca17d22bd8..f959bed2cb 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -44,6 +44,7 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
 unsigned long mfn, gfn;
 uint32_t status;
 int vmce_vcpuid;
+unsigned int mc_vcpuid;
 
 if (!mc_check_addr(bank->mc_status, bank->mc_misc, MC_ADDR_PHYSICAL)) {
 dprintk(XENLOG_WARNING,
@@ -88,18 +89,26 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
 goto vmce_failed;
 }
 
-if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
-global->mc_vcpuid == XEN_MC_VCPUID_INVALID)
+mc_vcpuid = global->mc_vcpuid;
+if (mc_vcpuid == XEN_MC_VCPUID_INVALID ||
+/*
+ * Because MC# may happen asynchronously with the actual
+ * operation that triggers the error, the domain ID as
+ * well as the vCPU ID collected in 'global' at MC# are
+ * not always precise. In that case, fallback to broadcast.
+ */
+global->mc_domid != bank->mc_domid ||
+(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+ (!(global->mc_gstatus & MCG_STATUS_LMCE) ||
+  !(d->vcpu[mc_vcpuid]->arch.vmce.mcg_ext_ctl &
+MCG_EXT_CTL_LMCE_EN
 vmce_vcpuid = VMCE_INJECT_BROADCAST;
 else
-vmce_vcpuid = global->mc_vcpuid;
+vmce_vcpuid = mc_vcpuid;
 
 bank->mc_addr = gfn << PAGE_SHIFT |
   (bank->mc_addr & (PAGE_SIZE -1 ));
-/* TODO: support injecting LMCE */
-if (fill_vmsr_data(bank, d,
-   global->mc_gstatus & ~MCG_STATUS_LMCE,
-   vmce_vcpuid == VMCE_INJECT_BROADCAST))
+if (fill_vmsr_data(bank, d, global->mc_gstatus, vmce_vcpuid))
 {
 mce_printk(MCE_QUIET, "Fill vMCE# data for DOM%d "
   "failed\n", bank->mc_domid);
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 060e2d0582..e2b3c5b8cc 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -465,14 +465,23 @@ static int vcpu_fill_mc_msrs(struct vcpu *v, uint64_t 
mcg_status,
 }
 
 int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
-   uint64_t gstatus, bool broadcast)
+   uint64_t gstatus, int vmce_vcpuid)
 {
 struct vcpu *v = d->vcpu[0];
+bool broadcast = (vmce_vcpuid == VMCE_INJECT_BROADCAST);
 int ret, err;
 
 if ( mc_bank->mc_domid == DOMID_INVALID )
 return -EINVAL;
 
+if ( broadcast )
+gstatus &= ~MCG_STATUS_LMCE;
+else if ( gstatus & MCG_STATUS_LMCE )
+{
+ASSERT(vmce_vcpuid >= 0 && vmce_vcpuid < d->max_vcpus);
+v = d->vcpu[vmce_vcpuid];
+}
+
 /*
  * vMCE with the actual error information is injected to vCPU0,
  * and, if broadcast is required, we choose to inject less severe
diff --git a/xen/arch/x86/cpu/mcheck/vmce.h b/xen/arch/x86/cpu/mcheck/vmce.h
index 74f6381460..2797e00275 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.h
+++ b/xen/arch/x86/cpu/mcheck/vmce.h
@@ -17,7 +17,7 @@ int vmce_amd_rdmsr(const struct vcpu *, uint32_t msr, 
uint64_t *val);
 int vmce_amd_wrmsr(struct vcpu *, uint32_t msr, uint64_t val);
 
 int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
-   uint64_t gstatus, bool broadcast);
+   uint64_t gstatus, int vmce_vcpuid);
 
 #define VMCE_INJECT_BROADCAST (-1)
 int inject_vmce(struct domain *d, int vcpu);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 0/7] Add LMCE support

2017-07-09 Thread Haozhong Zhang
Changes in v8:
 * Adjust the generalization of setting vMCE parameters in patch 1&2.
 * Other patches are not changed.

Haozhong Zhang (7):
  [M   ] 1/7 x86/domctl: generalize the restore of vMCE parameters
  [ M  ] 2/7 x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
  [  R ] 3/7 x86/vmce: enable injecting LMCE to guest on Intel host
  [  RA] 4/7 x86/vmce, tools/libxl: expose LMCE capability in guest 
MSR_IA32_MCG_CAP
  [  R ] 5/7 xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
  [   A] 6/7 tools/libxc: add support of injecting MC# to specified CPUs
  [   A] 7/7 tools/xen-mceinj: add support of injecting LMCE

 N: new in this version
 M: modified in this version
 R: got R-b
 A: got A-b

 docs/man/xl.cfg.pod.5.in| 24 +
 tools/libxc/include/xenctrl.h   |  2 ++
 tools/libxc/xc_misc.c   | 52 ++-
 tools/libxc/xc_sr_save_x86_hvm.c|  1 +
 tools/libxl/libxl.h |  7 
 tools/libxl/libxl_dom.c | 15 
 tools/libxl/libxl_types.idl |  1 +
 tools/tests/mce-test/tools/xen-mceinj.c | 50 --
 tools/xl/xl_parse.c | 31 ++--
 xen/arch/x86/cpu/mcheck/mcaction.c  | 23 
 xen/arch/x86/cpu/mcheck/mce.c   | 24 -
 xen/arch/x86/cpu/mcheck/mce.h   |  1 +
 xen/arch/x86/cpu/mcheck/mce_intel.c |  2 +-
 xen/arch/x86/cpu/mcheck/vmce.c  | 64 +++--
 xen/arch/x86/cpu/mcheck/vmce.h  |  2 +-
 xen/arch/x86/domctl.c   | 56 -
 xen/arch/x86/hvm/hvm.c  |  5 +++
 xen/include/asm-x86/mce.h   |  2 ++
 xen/include/public/arch-x86/hvm/save.h  |  1 +
 xen/include/public/arch-x86/xen-mca.h   |  1 +
 xen/include/public/hvm/params.h |  7 +++-
 21 files changed, 335 insertions(+), 36 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 1/7] x86/domctl: generalize the restore of vMCE parameters

2017-07-09 Thread Haozhong Zhang
vMCE parameters in struct xen_domctl_ext_vcpucontext were extended in
the past, and is likely to be extended in the future. When migrating a
PV domain from old Xen, XEN_DOMCTL_set_ext_vcpucontext should handle
the differences.

Instead of adding ad-hoc handling code at each extension, we introduce
an array to record sizes of the current and all past versions of vMCE
parameters, and search for the largest one that does not expire the
size of passed-in parameters to determine vMCE parameters that will be
restored. If vMCE parameters are extended in the future, we only need
to adapt the array to reflect the extension.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>

Changes in v8:
 * Rename valid_vmce_size[] tp valid_sizes[].
 * Use offsetof() + sizeof() in valid_sizes[] and macroize it.
 * Remove element 0 from valid_sizes[].
 * int i --> unsigned int i
 * Leave a blank line before the ending return.
---
 xen/arch/x86/domctl.c | 54 +++
 1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 7fa58b49af..125537b96d 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -302,6 +302,42 @@ static int update_domain_cpuid_info(struct domain *d,
 return 0;
 }
 
+static int vcpu_set_vmce(struct vcpu *v,
+ const struct xen_domctl_ext_vcpucontext *evc)
+{
+/*
+ * Sizes of vMCE parameters used by the current and past versions
+ * of Xen in descending order. If vMCE parameters are extended,
+ * remember to add the old size to this array by VMCE_SIZE().
+ */
+#define VMCE_SIZE(param) \
+(offsetof(typeof(evc->vmce), param) + sizeof(evc->vmce.param))
+
+static const unsigned int valid_sizes[] = {
+sizeof(evc->vmce),
+VMCE_SIZE(caps),
+};
+#undef VMCE_SIZE
+
+struct hvm_vmce_vcpu vmce = { };
+unsigned int evc_vmce_size = evc->size - offsetof(typeof(*evc), mcg_cap);
+unsigned int i = 0;
+
+BUILD_BUG_ON(offsetof(typeof(*evc), mcg_cap) !=
+ offsetof(typeof(*evc), vmce.caps));
+BUILD_BUG_ON(sizeof(evc->mcg_cap) != sizeof(evc->vmce.caps));
+
+while ( i < ARRAY_SIZE(valid_sizes) && evc_vmce_size < valid_sizes[i] )
+++i;
+
+if ( i == ARRAY_SIZE(valid_sizes) )
+return 0;
+
+memcpy(, >vmce, valid_sizes[i]);
+
+return vmce_restore_vcpu(v, );
+}
+
 void arch_get_domain_info(const struct domain *d,
   struct xen_domctl_getdomaininfo *info)
 {
@@ -912,23 +948,7 @@ long arch_do_domctl(
 else
 domain_pause(d);
 
-BUILD_BUG_ON(offsetof(struct xen_domctl_ext_vcpucontext,
-  mcg_cap) !=
- offsetof(struct xen_domctl_ext_vcpucontext,
-  vmce.caps));
-BUILD_BUG_ON(sizeof(evc->mcg_cap) != sizeof(evc->vmce.caps));
-if ( evc->size >= offsetof(typeof(*evc), vmce) +
-  sizeof(evc->vmce) )
-ret = vmce_restore_vcpu(v, >vmce);
-else if ( evc->size >= offsetof(typeof(*evc), mcg_cap) +
-   sizeof(evc->mcg_cap) )
-{
-struct hvm_vmce_vcpu vmce = { .caps = evc->mcg_cap };
-
-ret = vmce_restore_vcpu(v, );
-}
-else
-ret = 0;
+ret = vcpu_set_vmce(v, evc);
 
 domain_unpause(d);
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 2/7] x86/vmce: emulate MSR_IA32_MCG_EXT_CTL

2017-07-09 Thread Haozhong Zhang
If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, then allow guest
to read/write MSR_IA32_MCG_EXT_CTL.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>

Changes in v8:
 * Use offsetof() + sizeof() (VMCE_SIZE()) in valid_sizes[].
---
 xen/arch/x86/cpu/mcheck/vmce.c | 34 +-
 xen/arch/x86/domctl.c  |  2 ++
 xen/include/asm-x86/mce.h  |  1 +
 xen/include/public/arch-x86/hvm/save.h |  1 +
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 1356f611ab..060e2d0582 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -91,6 +91,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct 
hvm_vmce_vcpu *ctxt)
 v->arch.vmce.mcg_cap = ctxt->caps;
 v->arch.vmce.bank[0].mci_ctl2 = ctxt->mci_ctl2_bank0;
 v->arch.vmce.bank[1].mci_ctl2 = ctxt->mci_ctl2_bank1;
+v->arch.vmce.mcg_ext_ctl = ctxt->mcg_ext_ctl;
 
 return 0;
 }
@@ -200,6 +201,26 @@ int vmce_rdmsr(uint32_t msr, uint64_t *val)
 mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_CTL %#"PRIx64"\n", cur, 
*val);
 break;
 
+case MSR_IA32_MCG_EXT_CTL:
+/*
+ * If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, the LMCE and 
LOCK
+ * bits are always set in guest MSR_IA32_FEATURE_CONTROL by Xen, so it
+ * does not need to check them here.
+ */
+if ( cur->arch.vmce.mcg_cap & MCG_LMCE_P )
+{
+*val = cur->arch.vmce.mcg_ext_ctl;
+mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL %#"PRIx64"\n",
+   cur, *val);
+}
+else
+{
+ret = -1;
+mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL, not 
supported\n",
+   cur);
+}
+break;
+
 default:
 ret = mce_bank_msr(cur, msr) ? bank_mce_rdmsr(cur, msr, val) : 0;
 break;
@@ -309,6 +330,16 @@ int vmce_wrmsr(uint32_t msr, uint64_t val)
 mce_printk(MCE_VERBOSE, "MCE: %pv: MCG_CAP is r/o\n", cur);
 break;
 
+case MSR_IA32_MCG_EXT_CTL:
+if ( (cur->arch.vmce.mcg_cap & MCG_LMCE_P) &&
+ !(val & ~MCG_EXT_CTL_LMCE_EN) )
+cur->arch.vmce.mcg_ext_ctl = val;
+else
+ret = -1;
+mce_printk(MCE_VERBOSE, "MCE: %pv: wr MCG_EXT_CTL %"PRIx64"%s\n",
+   cur, val, (ret == -1) ? ", not supported" : "");
+break;
+
 default:
 ret = mce_bank_msr(cur, msr) ? bank_mce_wrmsr(cur, msr, val) : 0;
 break;
@@ -327,7 +358,8 @@ static int vmce_save_vcpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 struct hvm_vmce_vcpu ctxt = {
 .caps = v->arch.vmce.mcg_cap,
 .mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2,
-.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2
+.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2,
+.mcg_ext_ctl = v->arch.vmce.mcg_ext_ctl,
 };
 
 err = hvm_save_entry(VMCE_VCPU, v->vcpu_id, h, );
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 125537b96d..5f8b5a5629 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -315,6 +315,7 @@ static int vcpu_set_vmce(struct vcpu *v,
 
 static const unsigned int valid_sizes[] = {
 sizeof(evc->vmce),
+VMCE_SIZE(mci_ctl2_bank1),
 VMCE_SIZE(caps),
 };
 #undef VMCE_SIZE
@@ -907,6 +908,7 @@ long arch_do_domctl(
 evc->vmce.caps = v->arch.vmce.mcg_cap;
 evc->vmce.mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2;
 evc->vmce.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2;
+evc->vmce.mcg_ext_ctl = v->arch.vmce.mcg_ext_ctl;
 
 ret = 0;
 vcpu_unpause(v);
diff --git a/xen/include/asm-x86/mce.h b/xen/include/asm-x86/mce.h
index 56ad1f92dd..35f9962638 100644
--- a/xen/include/asm-x86/mce.h
+++ b/xen/include/asm-x86/mce.h
@@ -27,6 +27,7 @@ struct vmce_bank {
 struct vmce {
 uint64_t mcg_cap;
 uint64_t mcg_status;
+uint64_t mcg_ext_ctl;
 spinlock_t lock;
 struct vmce_bank bank[GUEST_MC_BANK_NUM];
 };
diff --git a/xen/include/public/arch-x86/hvm/save.h 
b/xen/include/public/arch-x86/hvm/save.h
index 816973b9c2..fd7bf3fb38 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -610,6 +610,7 @@ struct hvm_vmce_vcpu {
 uint64_t caps;
 uint64_t mci_ctl2_bank0;
 uint64_t mci_ctl2_bank1;
+uint64_t mcg_ext_ctl;
 };
 
 DECLARE_HVM_SAVE_TYPE(VMCE_VCPU, 18, struct hvm_vmce_vcpu);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 4/7] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP

2017-07-09 Thread Haozhong Zhang
If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present
in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP.
By default, LMCE is not exposed to guest so as to keep the backwards migration
compatibility.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com> for hypervisor side
Acked-by: Wei Liu <wei.l...@citrix.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 docs/man/xl.cfg.pod.5.in| 24 
 tools/libxc/xc_sr_save_x86_hvm.c|  1 +
 tools/libxl/libxl.h |  7 +++
 tools/libxl/libxl_dom.c | 15 +++
 tools/libxl/libxl_types.idl |  1 +
 tools/xl/xl_parse.c | 31 +--
 xen/arch/x86/cpu/mcheck/mce.h   |  1 +
 xen/arch/x86/cpu/mcheck/mce_intel.c |  2 +-
 xen/arch/x86/cpu/mcheck/vmce.c  | 19 ++-
 xen/arch/x86/hvm/hvm.c  |  5 +
 xen/include/asm-x86/mce.h   |  1 +
 xen/include/public/hvm/params.h |  7 ++-
 12 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index ff3203550f..79cb2eaea7 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -2173,6 +2173,30 @@ natively or via hardware backwards compatibility support.
 
 =back
 
+=head3 x86
+
+=over 4
+
+=item B

[Xen-devel] [PATCH v8 3/7] x86/vmce: enable injecting LMCE to guest on Intel host

2017-07-09 Thread Haozhong Zhang
Inject LMCE to guest if the host MCE is LMCE and the affected vcpu is
known. Otherwise, broadcast MCE to all vcpus on Intel host.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mcaction.c | 23 ---
 xen/arch/x86/cpu/mcheck/vmce.c | 11 ++-
 xen/arch/x86/cpu/mcheck/vmce.h |  2 +-
 3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c 
b/xen/arch/x86/cpu/mcheck/mcaction.c
index ca17d22bd8..f959bed2cb 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -44,6 +44,7 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
 unsigned long mfn, gfn;
 uint32_t status;
 int vmce_vcpuid;
+unsigned int mc_vcpuid;
 
 if (!mc_check_addr(bank->mc_status, bank->mc_misc, MC_ADDR_PHYSICAL)) {
 dprintk(XENLOG_WARNING,
@@ -88,18 +89,26 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
 goto vmce_failed;
 }
 
-if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
-global->mc_vcpuid == XEN_MC_VCPUID_INVALID)
+mc_vcpuid = global->mc_vcpuid;
+if (mc_vcpuid == XEN_MC_VCPUID_INVALID ||
+/*
+ * Because MC# may happen asynchronously with the actual
+ * operation that triggers the error, the domain ID as
+ * well as the vCPU ID collected in 'global' at MC# are
+ * not always precise. In that case, fallback to broadcast.
+ */
+global->mc_domid != bank->mc_domid ||
+(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+ (!(global->mc_gstatus & MCG_STATUS_LMCE) ||
+  !(d->vcpu[mc_vcpuid]->arch.vmce.mcg_ext_ctl &
+MCG_EXT_CTL_LMCE_EN
 vmce_vcpuid = VMCE_INJECT_BROADCAST;
 else
-vmce_vcpuid = global->mc_vcpuid;
+vmce_vcpuid = mc_vcpuid;
 
 bank->mc_addr = gfn << PAGE_SHIFT |
   (bank->mc_addr & (PAGE_SIZE -1 ));
-/* TODO: support injecting LMCE */
-if (fill_vmsr_data(bank, d,
-   global->mc_gstatus & ~MCG_STATUS_LMCE,
-   vmce_vcpuid == VMCE_INJECT_BROADCAST))
+if (fill_vmsr_data(bank, d, global->mc_gstatus, vmce_vcpuid))
 {
 mce_printk(MCE_QUIET, "Fill vMCE# data for DOM%d "
   "failed\n", bank->mc_domid);
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 060e2d0582..e2b3c5b8cc 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -465,14 +465,23 @@ static int vcpu_fill_mc_msrs(struct vcpu *v, uint64_t 
mcg_status,
 }
 
 int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
-   uint64_t gstatus, bool broadcast)
+   uint64_t gstatus, int vmce_vcpuid)
 {
 struct vcpu *v = d->vcpu[0];
+bool broadcast = (vmce_vcpuid == VMCE_INJECT_BROADCAST);
 int ret, err;
 
 if ( mc_bank->mc_domid == DOMID_INVALID )
 return -EINVAL;
 
+if ( broadcast )
+gstatus &= ~MCG_STATUS_LMCE;
+else if ( gstatus & MCG_STATUS_LMCE )
+{
+ASSERT(vmce_vcpuid >= 0 && vmce_vcpuid < d->max_vcpus);
+v = d->vcpu[vmce_vcpuid];
+}
+
 /*
  * vMCE with the actual error information is injected to vCPU0,
  * and, if broadcast is required, we choose to inject less severe
diff --git a/xen/arch/x86/cpu/mcheck/vmce.h b/xen/arch/x86/cpu/mcheck/vmce.h
index 74f6381460..2797e00275 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.h
+++ b/xen/arch/x86/cpu/mcheck/vmce.h
@@ -17,7 +17,7 @@ int vmce_amd_rdmsr(const struct vcpu *, uint32_t msr, 
uint64_t *val);
 int vmce_amd_wrmsr(struct vcpu *, uint32_t msr, uint64_t val);
 
 int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
-   uint64_t gstatus, bool broadcast);
+   uint64_t gstatus, int vmce_vcpuid);
 
 #define VMCE_INJECT_BROADCAST (-1)
 int inject_vmce(struct domain *d, int vcpu);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 6/7] tools/libxc: add support of injecting MC# to specified CPUs

2017-07-09 Thread Haozhong Zhang
Though XEN_MC_inject_v2 allows injecting MC# to specified CPUs, the
current xc_mca_op() does not use this feature and not provide an
interface to callers. This commit add a new xc_mca_op_inject_v2() that
receives a cpumap providing the set of target CPUs.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Acked-by: Wei Liu <wei.l...@citrix.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/libxc/include/xenctrl.h |  2 ++
 tools/libxc/xc_misc.c | 52 ++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index c51bb3b448..552a4fd47d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1809,6 +1809,8 @@ int xc_cpuid_apply_policy(xc_interface *xch,
 void xc_cpuid_to_str(const unsigned int *regs,
  char **strs); /* some strs[] may be NULL if ENOMEM */
 int xc_mca_op(xc_interface *xch, struct xen_mc *mc);
+int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
+xc_cpumap_t cpumap, unsigned int nr_cpus);
 #endif
 
 struct xc_px_val {
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 88084fde30..2303293c6c 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -341,7 +341,57 @@ int xc_mca_op(xc_interface *xch, struct xen_mc *mc)
 xc_hypercall_bounce_post(xch, mc);
 return ret;
 }
-#endif
+
+int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
+xc_cpumap_t cpumap, unsigned int nr_bits)
+{
+int ret = -1;
+struct xen_mc mc_buf, *mc = _buf;
+struct xen_mc_inject_v2 *inject = >u.mc_inject_v2;
+
+DECLARE_HYPERCALL_BOUNCE(cpumap, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+DECLARE_HYPERCALL_BOUNCE(mc, sizeof(*mc), XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+
+memset(mc, 0, sizeof(*mc));
+
+if ( cpumap )
+{
+if ( !nr_bits )
+{
+errno = EINVAL;
+goto out;
+}
+
+HYPERCALL_BOUNCE_SET_SIZE(cpumap, (nr_bits + 7) / 8);
+if ( xc_hypercall_bounce_pre(xch, cpumap) )
+{
+PERROR("Could not bounce cpumap memory buffer");
+goto out;
+}
+set_xen_guest_handle(inject->cpumap.bitmap, cpumap);
+inject->cpumap.nr_bits = nr_bits;
+}
+
+inject->flags = flags;
+mc->cmd = XEN_MC_inject_v2;
+mc->interface_version = XEN_MCA_INTERFACE_VERSION;
+
+if ( xc_hypercall_bounce_pre(xch, mc) )
+{
+PERROR("Could not bounce xen_mc memory buffer");
+goto out_free_cpumap;
+}
+
+ret = xencall1(xch->xcall, __HYPERVISOR_mca, HYPERCALL_BUFFER_AS_ARG(mc));
+
+xc_hypercall_bounce_post(xch, mc);
+out_free_cpumap:
+if ( cpumap )
+xc_hypercall_bounce_post(xch, cpumap);
+out:
+return ret;
+}
+#endif /* __i386__ || __x86_64__ */
 
 int xc_perfc_reset(xc_interface *xch)
 {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 5/7] xen/mce: add support of vLMCE injection to XEN_MC_inject_v2

2017-07-09 Thread Haozhong Zhang
Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mce.c | 24 +++-
 xen/include/public/arch-x86/xen-mca.h |  1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index ee04fb54ff..30525dd78b 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -1485,11 +1485,12 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc)
 {
 const cpumask_t *cpumap;
 cpumask_var_t cmv;
+bool broadcast = op->u.mc_inject_v2.flags & 
XEN_MC_INJECT_CPU_BROADCAST;
 
 if (nr_mce_banks == 0)
 return x86_mcerr("do_mca #MC", -ENODEV);
 
-if ( op->u.mc_inject_v2.flags & XEN_MC_INJECT_CPU_BROADCAST )
+if ( broadcast )
 cpumap = _online_map;
 else
 {
@@ -1529,6 +1530,27 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc)
 }
 break;
 
+case XEN_MC_INJECT_TYPE_LMCE:
+if ( !lmce_support )
+{
+ret = x86_mcerr("No LMCE support", -EINVAL);
+break;
+}
+if ( broadcast )
+{
+ret = x86_mcerr("Broadcast cannot be used with LMCE", -EINVAL);
+break;
+}
+/* Ensure at most one CPU is specified. */
+if ( nr_cpu_ids > cpumask_next(cpumask_first(cpumap), cpumap) )
+{
+ret = x86_mcerr("More than one CPU specified for LMCE",
+-EINVAL);
+break;
+}
+on_selected_cpus(cpumap, x86_mc_mceinject, NULL, 1);
+break;
+
 default:
 ret = x86_mcerr("Wrong mca type\n", -EINVAL);
 break;
diff --git a/xen/include/public/arch-x86/xen-mca.h 
b/xen/include/public/arch-x86/xen-mca.h
index 7db990723b..dc35267249 100644
--- a/xen/include/public/arch-x86/xen-mca.h
+++ b/xen/include/public/arch-x86/xen-mca.h
@@ -414,6 +414,7 @@ struct xen_mc_mceinject {
 #define XEN_MC_INJECT_TYPE_MASK 0x7
 #define XEN_MC_INJECT_TYPE_MCE  0x0
 #define XEN_MC_INJECT_TYPE_CMCI 0x1
+#define XEN_MC_INJECT_TYPE_LMCE 0x2
 
 #define XEN_MC_INJECT_CPU_BROADCAST 0x8
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 7/7] tools/xen-mceinj: add support of injecting LMCE

2017-07-09 Thread Haozhong Zhang
If option '-l' or '--lmce' is specified and the host supports LMCE,
xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c'
is not present).

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Acked-by: Wei Liu <wei.l...@citrix.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/tests/mce-test/tools/xen-mceinj.c | 50 +++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/tools/tests/mce-test/tools/xen-mceinj.c 
b/tools/tests/mce-test/tools/xen-mceinj.c
index bae5a46eb5..380e42190c 100644
--- a/tools/tests/mce-test/tools/xen-mceinj.c
+++ b/tools/tests/mce-test/tools/xen-mceinj.c
@@ -56,6 +56,8 @@
 #define MSR_IA32_MC0_MISC0x0403
 #define MSR_IA32_MC0_CTL20x0280
 
+#define MCG_STATUS_LMCE  0x8
+
 struct mce_info {
 const char *description;
 uint8_t mcg_stat;
@@ -113,6 +115,7 @@ static struct mce_info mce_table[] = {
 #define LOGFILE stdout
 
 int dump;
+int lmce;
 struct xen_mc_msrinject msr_inj;
 
 static void Lprintf(const char *fmt, ...)
@@ -212,6 +215,35 @@ static int inject_mce(xc_interface *xc_handle, int cpu_nr)
 return xc_mca_op(xc_handle, );
 }
 
+static int inject_lmce(xc_interface *xc_handle, unsigned int cpu)
+{
+uint8_t *cpumap = NULL;
+size_t cpumap_size, line, shift;
+unsigned int nr_cpus;
+int ret;
+
+nr_cpus = mca_cpuinfo(xc_handle);
+if ( !nr_cpus )
+err(xc_handle, "Failed to get mca_cpuinfo");
+if ( cpu >= nr_cpus )
+err(xc_handle, "-c %u is larger than %u", cpu, nr_cpus - 1);
+
+cpumap_size = (nr_cpus + 7) / 8;
+cpumap = malloc(cpumap_size);
+if ( !cpumap )
+err(xc_handle, "Failed to allocate cpumap\n");
+memset(cpumap, 0, cpumap_size);
+line = cpu / 8;
+shift = cpu % 8;
+memset(cpumap + line, 1 << shift, 1);
+
+ret = xc_mca_op_inject_v2(xc_handle, XEN_MC_INJECT_TYPE_LMCE,
+  cpumap, cpumap_size * 8);
+
+free(cpumap);
+return ret;
+}
+
 static uint64_t bank_addr(int bank, int type)
 {
 uint64_t addr;
@@ -330,8 +362,15 @@ static int inject(xc_interface *xc_handle, struct mce_info 
*mce,
   uint32_t cpu_nr, uint32_t domain, uint64_t gaddr)
 {
 int ret = 0;
+uint8_t mcg_status = mce->mcg_stat;
 
-ret = inject_mcg_status(xc_handle, cpu_nr, mce->mcg_stat, domain);
+if ( lmce )
+{
+if ( mce->cmci )
+err(xc_handle, "No support to inject CMCI as LMCE");
+mcg_status |= MCG_STATUS_LMCE;
+}
+ret = inject_mcg_status(xc_handle, cpu_nr, mcg_status, domain);
 if ( ret )
 err(xc_handle, "Failed to inject MCG_STATUS MSR");
 
@@ -354,6 +393,8 @@ static int inject(xc_interface *xc_handle, struct mce_info 
*mce,
 err(xc_handle, "Failed to inject MSR");
 if ( mce->cmci )
 ret = inject_cmci(xc_handle, cpu_nr);
+else if ( lmce )
+ret = inject_lmce(xc_handle, cpu_nr);
 else
 ret = inject_mce(xc_handle, cpu_nr);
 if ( ret )
@@ -393,6 +434,7 @@ static struct option opts[] = {
 {"dump", 0, 0, 'D'},
 {"help", 0, 0, 'h'},
 {"page", 0, 0, 'p'},
+{"lmce", 0, 0, 'l'},
 {"", 0, 0, '\0'}
 };
 
@@ -409,6 +451,7 @@ static void help(void)
"  -d, --domain=DOMID   target domain, the default is Xen itself\n"
"  -h, --help   print this page\n"
"  -p, --page=ADDR  physical address to report\n"
+   "  -l, --lmce   inject as LMCE (Intel only)\n"
"  -t, --type=ERROR error type\n");
 
 for ( i = 0; i < MCE_TABLE_SIZE; i++ )
@@ -438,7 +481,7 @@ int main(int argc, char *argv[])
 }
 
 while ( 1 ) {
-c = getopt_long(argc, argv, "c:Dd:t:hp:", opts, _index);
+c = getopt_long(argc, argv, "c:Dd:t:hp:l", opts, _index);
 if ( c == -1 )
 break;
 switch ( c ) {
@@ -463,6 +506,9 @@ int main(int argc, char *argv[])
 case 't':
 type = strtol(optarg, NULL, 0);
 break;
+case 'l':
+lmce = 1;
+break;
 case 'h':
 default:
 help();
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v7 4/7] x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP

2017-07-06 Thread Haozhong Zhang
If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present
in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP.
By default, LMCE is not exposed to guest so as to keep the backwards migration
compatibility.

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com> for hypervisor side
Acked-by: Wei Liu <wei.l...@citrix.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 docs/man/xl.cfg.pod.5.in| 24 
 tools/libxc/xc_sr_save_x86_hvm.c|  1 +
 tools/libxl/libxl.h |  7 +++
 tools/libxl/libxl_dom.c | 15 +++
 tools/libxl/libxl_types.idl |  1 +
 tools/xl/xl_parse.c | 31 +--
 xen/arch/x86/cpu/mcheck/mce.h   |  1 +
 xen/arch/x86/cpu/mcheck/mce_intel.c |  2 +-
 xen/arch/x86/cpu/mcheck/vmce.c  | 19 ++-
 xen/arch/x86/hvm/hvm.c  |  5 +
 xen/include/asm-x86/mce.h   |  1 +
 xen/include/public/hvm/params.h |  7 ++-
 12 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index ff3203550f..79cb2eaea7 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -2173,6 +2173,30 @@ natively or via hardware backwards compatibility support.
 
 =back
 
+=head3 x86
+
+=over 4
+
+=item B

[Xen-devel] [PATCH v7 0/7] Add LMCE support

2017-07-06 Thread Haozhong Zhang
v7 is based on staging branch and only contains the remaining patches.

Changes in v7:
 * (Patch 1) Introduce a general way to restore vMCE parameters.
 * (Patch 2) Adapt to the change in patch 1.
 * Other patch 3 - 7 remain the same as v5 patch 7 - 11.

Haozhong Zhang (7):
  [N   ] 1/7 x86/domctl: generalize the restore of vMCE parameters
  [ M  ] 2/7 x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
  [  R ] 3/7 x86/vmce: enable injecting LMCE to guest on Intel host
  [  RA] 4/7 x86/vmce, tools/libxl: expose LMCE capability in guest 
MSR_IA32_MCG_CAP
  [  R ] 5/7 xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
  [   A] 6/7 tools/libxc: add support of injecting MC# to specified CPUs
  [   A] 7/7 tools/xen-mceinj: add support of injecting LMCE

 N: new in this version
 M: modified in this version
 R: got R-b
 A: got A-b

 docs/man/xl.cfg.pod.5.in| 24 +
 tools/libxc/include/xenctrl.h   |  2 ++
 tools/libxc/xc_misc.c   | 52 ++-
 tools/libxc/xc_sr_save_x86_hvm.c|  1 +
 tools/libxl/libxl.h |  7 
 tools/libxl/libxl_dom.c | 15 
 tools/libxl/libxl_types.idl |  1 +
 tools/tests/mce-test/tools/xen-mceinj.c | 50 --
 tools/xl/xl_parse.c | 31 ++--
 xen/arch/x86/cpu/mcheck/mcaction.c  | 23 
 xen/arch/x86/cpu/mcheck/mce.c   | 24 -
 xen/arch/x86/cpu/mcheck/mce.h   |  1 +
 xen/arch/x86/cpu/mcheck/mce_intel.c |  2 +-
 xen/arch/x86/cpu/mcheck/vmce.c  | 64 +++--
 xen/arch/x86/cpu/mcheck/vmce.h  |  2 +-
 xen/arch/x86/domctl.c   | 53 ++-
 xen/arch/x86/hvm/hvm.c  |  5 +++
 xen/include/asm-x86/mce.h   |  2 ++
 xen/include/public/arch-x86/hvm/save.h  |  1 +
 xen/include/public/arch-x86/xen-mca.h   |  1 +
 xen/include/public/hvm/params.h |  7 +++-
 21 files changed, 332 insertions(+), 36 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v7 7/7] tools/xen-mceinj: add support of injecting LMCE

2017-07-06 Thread Haozhong Zhang
If option '-l' or '--lmce' is specified and the host supports LMCE,
xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c'
is not present).

Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com>
Acked-by: Wei Liu <wei.l...@citrix.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
---
 tools/tests/mce-test/tools/xen-mceinj.c | 50 +++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/tools/tests/mce-test/tools/xen-mceinj.c 
b/tools/tests/mce-test/tools/xen-mceinj.c
index bae5a46eb5..380e42190c 100644
--- a/tools/tests/mce-test/tools/xen-mceinj.c
+++ b/tools/tests/mce-test/tools/xen-mceinj.c
@@ -56,6 +56,8 @@
 #define MSR_IA32_MC0_MISC0x0403
 #define MSR_IA32_MC0_CTL20x0280
 
+#define MCG_STATUS_LMCE  0x8
+
 struct mce_info {
 const char *description;
 uint8_t mcg_stat;
@@ -113,6 +115,7 @@ static struct mce_info mce_table[] = {
 #define LOGFILE stdout
 
 int dump;
+int lmce;
 struct xen_mc_msrinject msr_inj;
 
 static void Lprintf(const char *fmt, ...)
@@ -212,6 +215,35 @@ static int inject_mce(xc_interface *xc_handle, int cpu_nr)
 return xc_mca_op(xc_handle, );
 }
 
+static int inject_lmce(xc_interface *xc_handle, unsigned int cpu)
+{
+uint8_t *cpumap = NULL;
+size_t cpumap_size, line, shift;
+unsigned int nr_cpus;
+int ret;
+
+nr_cpus = mca_cpuinfo(xc_handle);
+if ( !nr_cpus )
+err(xc_handle, "Failed to get mca_cpuinfo");
+if ( cpu >= nr_cpus )
+err(xc_handle, "-c %u is larger than %u", cpu, nr_cpus - 1);
+
+cpumap_size = (nr_cpus + 7) / 8;
+cpumap = malloc(cpumap_size);
+if ( !cpumap )
+err(xc_handle, "Failed to allocate cpumap\n");
+memset(cpumap, 0, cpumap_size);
+line = cpu / 8;
+shift = cpu % 8;
+memset(cpumap + line, 1 << shift, 1);
+
+ret = xc_mca_op_inject_v2(xc_handle, XEN_MC_INJECT_TYPE_LMCE,
+  cpumap, cpumap_size * 8);
+
+free(cpumap);
+return ret;
+}
+
 static uint64_t bank_addr(int bank, int type)
 {
 uint64_t addr;
@@ -330,8 +362,15 @@ static int inject(xc_interface *xc_handle, struct mce_info 
*mce,
   uint32_t cpu_nr, uint32_t domain, uint64_t gaddr)
 {
 int ret = 0;
+uint8_t mcg_status = mce->mcg_stat;
 
-ret = inject_mcg_status(xc_handle, cpu_nr, mce->mcg_stat, domain);
+if ( lmce )
+{
+if ( mce->cmci )
+err(xc_handle, "No support to inject CMCI as LMCE");
+mcg_status |= MCG_STATUS_LMCE;
+}
+ret = inject_mcg_status(xc_handle, cpu_nr, mcg_status, domain);
 if ( ret )
 err(xc_handle, "Failed to inject MCG_STATUS MSR");
 
@@ -354,6 +393,8 @@ static int inject(xc_interface *xc_handle, struct mce_info 
*mce,
 err(xc_handle, "Failed to inject MSR");
 if ( mce->cmci )
 ret = inject_cmci(xc_handle, cpu_nr);
+else if ( lmce )
+ret = inject_lmce(xc_handle, cpu_nr);
 else
 ret = inject_mce(xc_handle, cpu_nr);
 if ( ret )
@@ -393,6 +434,7 @@ static struct option opts[] = {
 {"dump", 0, 0, 'D'},
 {"help", 0, 0, 'h'},
 {"page", 0, 0, 'p'},
+{"lmce", 0, 0, 'l'},
 {"", 0, 0, '\0'}
 };
 
@@ -409,6 +451,7 @@ static void help(void)
"  -d, --domain=DOMID   target domain, the default is Xen itself\n"
"  -h, --help   print this page\n"
"  -p, --page=ADDR  physical address to report\n"
+   "  -l, --lmce   inject as LMCE (Intel only)\n"
"  -t, --type=ERROR error type\n");
 
 for ( i = 0; i < MCE_TABLE_SIZE; i++ )
@@ -438,7 +481,7 @@ int main(int argc, char *argv[])
 }
 
 while ( 1 ) {
-c = getopt_long(argc, argv, "c:Dd:t:hp:", opts, _index);
+c = getopt_long(argc, argv, "c:Dd:t:hp:l", opts, _index);
 if ( c == -1 )
 break;
 switch ( c ) {
@@ -463,6 +506,9 @@ int main(int argc, char *argv[])
 case 't':
 type = strtol(optarg, NULL, 0);
 break;
+case 'l':
+lmce = 1;
+break;
 case 'h':
 default:
 help();
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   3   4   5   6   7   >