On Mon, Jul 10, 2017 at 04:41:35AM -0600, Jan Beulich wrote:
> >>> On 10.07.17 at 12:10, wrote:
> > I would like to verify on which NUMA node the PFNs used by a HVM guest
> > are located. Is there an API for that? Something like:
> >
> > foreach (pfn, domid)
> > mfns_per_node[pfn_to_node(pfn)]++
> > foreach (node)
> > printk("%x %x\n", node, mfns_per_node[node])
>
> phys_to_nid() ?
Soo I wrote some code for exactly this for Xen 4.4.4 , along with
creation of a PGM map to see the NUMA nodes locality.
Attaching them here..
>
> Jan
>
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
>From a5e039801c989df29b704a4a5256715321906535 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk
Date: Tue, 6 Jun 2017 20:31:21 -0400
Subject: [PATCH 1/7] xen/x86: XENDOMCTL_get_memlist: Make it work
This hypercall has a bunch of problems which this patch
fixes.
Specifically it is not preempt capable, takes a nested lock,
and the data is stale after you get it.
The nested lock (and order inversion) is due to the
copy_to_guest_offset call. The particular implementation
(see __hvm_copy) makes P2M calls (p2m_mem_paging_populate), which
take the p2m_lock.
We avoid this by taking the p2m lock early (before page_lock) in:
if ( !guest_handle_okay(domctl->u.getmemlist.buffer, max_pfns) )
here (this takes the p2m lock and then unlocks). And since
it checks out, we can use the fast variant of copy_to_guest
(which still takes the p2m lock).
And we extend this thinking in the copying of the values to
the guest. The loop that copies the mfns[] to buffer
takes (potentially) a p2m lock on every invocation. So to
not make us holding the page_alloc_lock we create a temporary
array (mfns) - which is filled while holding page_alloc_lock.
But we don't hold any locks (well, we hold the domctl lock)
while copying to the guest.
The preemption is used and we also honor 'start_pfn' which
is renamed to 'index' - as there is no enforced order in which
the pages correspond to PFNs.
All of those are fixed by this patch, also it means that
the callers of xc_get_pfn_list have to take into account
that max_pfns != num_pfns value and loop around.
See patch: "libxc: Use XENDOMCTL_get_memlist properly"
and "xen-mceinj: Loop around xc_get_pfn_list"
Signed-off-by: Konrad Rzeszutek Wilk
---
xen/arch/x86/domctl.c | 76 ++---
xen/arch/x86/mm/hap/hap.c | 1 +
xen/arch/x86/mm/p2m-ept.c | 2 ++
xen/include/asm-x86/p2m.h | 2 ++
xen/include/public/domctl.h | 36 -
5 files changed, 84 insertions(+), 33 deletions(-)
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index bebe1fb..3af6b39 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -325,57 +325,83 @@ long arch_do_domctl(
case XEN_DOMCTL_getmemlist:
{
-int i;
+#define XEN_DOMCTL_getmemlist_max_pfns (GB(1) / PAGE_SIZE)
+unsigned int i = 0, idx = 0;
unsigned long max_pfns = domctl->u.getmemlist.max_pfns;
+unsigned long index = domctl->u.getmemlist.index;
uint64_t mfn;
struct page_info *page;
+uint64_t *mfns;
if ( unlikely(d->is_dying) ) {
ret = -EINVAL;
break;
}
+/* XSA-74: This sub-hypercall is fixed. */
-/*
- * XSA-74: This sub-hypercall is broken in several ways:
- * - lock order inversion (p2m locks inside page_alloc_lock)
- * - no preemption on huge max_pfns input
- * - not (re-)checking d->is_dying with page_alloc_lock held
- * - not honoring start_pfn input (which libxc also doesn't set)
- * Additionally it is rather useless, as the result is stale by the
- * time the caller gets to look at it.
- * As it only has a single, non-production consumer (xen-mceinj),
- * rather than trying to fix it we restrict it for the time being.
- */
-if ( /* No nested locks inside copy_to_guest_offset(). */
- paging_mode_external(current->domain) ||
- /* Arbitrary limit capping processing time. */
- max_pfns > GB(4) / PAGE_SIZE )
+ret = -E2BIG;
+if ( max_pfns > XEN_DOMCTL_getmemlist_max_pfns )
+max_pfns = XEN_DOMCTL_getmemlist_max_pfns;
+
+/* Report the max number we are OK with. */
+if ( !max_pfns && guest_handle_is_null(domctl->u.getmemlist.buffer) )
{
-ret = -EOPNOTSUPP;
+domctl->u.getmemlist.max_pfns = XEN_DOMCTL_getmemlist_max_pfns;
+copyback = 1;
break;
}
-spin_lock(&d->page_alloc_lock);
+ret = -EINVAL;
+if ( !guest_handle_okay(domctl->u.getmemlist.buffer, max_pfns) )
+break;
+
+mfns = xmalloc_array(uint64_t, max_pfns);
+if ( !mfns )
+{
+ret = -ENOMEM;
+break;