Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM

2019-12-18 Thread Julien Grall

Hi Roman,

On 19/12/2019 00:28, Roman Shaposhnik wrote:

On Wed, Dec 18, 2019 at 2:17 PM Julien Grall  wrote:


Hi Roman,

On 18/12/2019 17:03, Roman Shaposhnik wrote:

On Wed, Dec 18, 2019 at 3:50 AM Julien Grall  wrote:
So -- nothing boots directly by UEFI -- everything goes through GRUB.

However, my understanding is that GRUB will detect devicetree
information provided by UEFI (even though devicetree command is
supposed to completely replace that). Hence it is possible that Linux
relies on some residuals left in memory by GRUB that Xen doesn't pay
attention to (but this is a pretty wild speculation only).


While it goes through GRUB, it is a bootloader and will just act as a
proxy for EFI. So EFI application such as Xen/Linux can still be loaded
and take advantage of runtime servies if present/implemented.


Aha! So then it depends on Xen actually using those EFI services. Which
leads to my first question:
1. would it be possible to stay completely with just devicetrees information
by passing efi=no-rs to Xen?
This will only disabled the runtime services (note that they are not 
supported on Xen on Arm today). What I described above is part of the 
boot services and can't be disabled.


Also, I am not entirely sure GRUB/EFI will update you device-tree to 
point out the memory that was carved out for things like ATF.


Looking at the DTS memory node you provided in another e-mail, it seems 
the memory map is slightly different.





In fact most of people on Arm are using GRUB rather than EFI directly as
this is more friendly to use.

Regarding the devicetree, Xen and Linux will completely ignore the
memory nodes in Xen if using EFI. This because the EFI memory map will
give you an overview of the platform with the EFI regions included.


Aha! So in that sense it is a bug in Xen after all, right? (that's what you're
referring to when you say you now understand what needs to get fixed).


Yes. The EFI memory map is a list of existing memory with a type 
associated to it (Conventional, BootServiceCodes, MemoryMappedIO...).


The OS/Hypervisor will have to go through them and check which regions 
are usuable. Compare to Linux, Xen has limited itself to only a few types.


However, I think we can be on a par with Linux here.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork

2019-12-18 Thread Julien Grall

Hi Tamas,

On 19/12/2019 00:15, Tamas K Lengyel wrote:

On Wed, Dec 18, 2019 at 4:02 PM Julien Grall  wrote:


Hi,

On 18/12/2019 22:33, Tamas K Lengyel wrote:

On Wed, Dec 18, 2019 at 3:00 PM Julien Grall  wrote:


Hi Tamas,

On 18/12/2019 19:40, Tamas K Lengyel wrote:

Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel 
---
xen/arch/x86/mm/mem_sharing.c | 105 ++
xen/include/public/memory.h   |   1 +
2 files changed, 106 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index e93ad2ec5a..4735a334b9 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct 
domain *cd)
return 0;
}

+struct gfn_free;
+struct gfn_free {
+struct gfn_free *next;
+struct page_info *page;
+gfn_t gfn;
+};
+
+static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
+{
+int rc;
+
+struct p2m_domain* p2m = p2m_get_hostp2m(cd);
+struct gfn_free *list = NULL;
+struct page_info *page;
+
+page_list_for_each(page, >page_list)


AFAICT, your domain is not paused, so it would be possible to have page
added/remove in that list behind your back.


Well, it's not that it's not paused, it's just that I haven't added a
sanity check to make sure it is. The toolstack can (and should) pause
it, so that sanity check would be warranted.

I have only read the hypervisor part, so I didn't know what the
toolstack has done.


I've added the same enforced VM paused operation that is present for
the fork hypercall handler.







You also have multiple loop on the page_list in this function. Given the
number of page_list can be quite big, this is a call for hogging the
pCPU and an RCU lock on the domain vCPU running this call.


There is just one loop over page_list itself, the second loop is on
the internal list that is being built here which will be a subset. The
list itself in fact should be small (in our tests usually <100).


For a first, nothing in this function tells me that there will be only
100 pages. But then, I don't think this is right to implement your
hypercall based only the  "normal" scenario. You should also think about
the "worst" case scenario.

In this case the worst case scenario is have hundreds of page in page_list.


Well, this is only an experimental system that's completely disabled
by default. Making the assumption that people who make use of it will
know what they are doing I think is fair.


I assume that if you submit to upstream this new hypercall then there is 
longer plan to have more people to use it and potentially making 
"stable". If not, then it raises the question why this is pushed upstream...


In any case, all the known assumptions should be documented so they can 
be fixed rather than forgotten until it is rediscovered via an XSA.







Granted the list can grow larger, but in those cases its likely better
to just discard the fork and create a new one. So in my opinion adding
a hypercall continuation to this not needed


How would the caller know it? What would happen if the caller ends up to
call this with a growing list.


The caller knows by virtue of knowing how long the VM was executed
for. In the usecase this is targeted at the VM was executing only for
a couple seconds at most. Usually much less then that (we get about
~80 resets/s with AFL). During that time its extremely unlikely you
get more then a ~100 pages deduplicated (that is, written to). But
even if there are more pages, it just means the hypercall might take a
bit longer to run for that iteration.


I assume if you upstream the code then you want more people to use it 
(otherwise what's the point?). In this case, you will likely have people 
that heard about the feature, wants to test but don't know the internal.


Such users need to know how this can be call safely without reading the 
implementation. In other words, some documentation for your hypercall is 
needed.



I don't see any issue with not
breaking up this hypercall with continuation even under the worst case
situation though.


Xen only supports voluntary preemption, this means that an hypercall can 
only be preempted if there is code for it. Otherwise the preemption will 
mostly only happen when returning to the guest.


In other words, the vCPU executing the hypercall may go past its 
timeslice and prevent other vCPU to run.



[Xen-devel] [PATCH v3 1/2] xen: put more code under CONFIG_CRASH_DEBUG

2019-12-18 Thread Juergen Gross
Some code is not needed with CONFIG_CRASH_DEBUG, so only include it if
CONFIG_CRASH_DEBUG is defined.

While at it remove CONFIG_HAS_GDBSX as it can easily be replaced by
CONFIG_CRASH_DEBUG.

Signed-off-by: Juergen Gross 
---
V3:
- move domain_pause_for_debugger() into arch/x86/domain.c (Andrew Cooper)
---
 xen/arch/x86/Kconfig|  1 -
 xen/arch/x86/domain.c   | 13 +
 xen/arch/x86/hvm/vmx/realmode.c |  1 +
 xen/common/Kconfig  |  3 ---
 xen/common/domain.c | 14 --
 xen/include/asm-x86/debugger.h  | 32 
 xen/include/xen/sched.h |  1 -
 7 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 02bb05f42e..f853c04564 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -13,7 +13,6 @@ config X86
select HAS_EHCI
select HAS_EX_TABLE
select HAS_FAST_MULTIPLY
-   select HAS_GDBSX
select HAS_IOPORTS
select HAS_KEXEC
select MEM_ACCESS_ALWAYS_ON
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 7cb7fd31dd..3a3fbde642 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2318,6 +2318,19 @@ static int __init init_vcpu_kick_softirq(void)
 }
 __initcall(init_vcpu_kick_softirq);
 
+void domain_pause_for_debugger(void)
+{
+#ifdef CONFIG_CRASH_DEBUG
+struct vcpu *curr = current;
+struct domain *d = curr->domain;
+
+domain_pause_by_systemcontroller_nosync(d);
+
+/* if gdbsx active, we just need to pause the domain */
+if ( curr->arch.gdbsx_vcpu_event == 0 )
+send_global_virq(VIRQ_DEBUGGER);
+#endif
+}
 
 /*
  * Local variables:
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index bb0b4439df..bdbd9cb921 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 2f516da101..b3d161d057 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -57,9 +57,6 @@ config HAS_UBSAN
 config HAS_KEXEC
bool
 
-config HAS_GDBSX
-   bool
-
 config HAS_IOPORTS
bool
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 66c7fc..3a77d717db 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -915,20 +915,6 @@ void vcpu_end_shutdown_deferral(struct vcpu *v)
 vcpu_check_shutdown(v);
 }
 
-#ifdef CONFIG_HAS_GDBSX
-void domain_pause_for_debugger(void)
-{
-struct vcpu *curr = current;
-struct domain *d = curr->domain;
-
-domain_pause_by_systemcontroller_nosync(d);
-
-/* if gdbsx active, we just need to pause the domain */
-if ( curr->arch.gdbsx_vcpu_event == 0 )
-send_global_virq(VIRQ_DEBUGGER);
-}
-#endif
-
 /* Complete domain destroy after RCU readers are not holding old references. */
 static void complete_domain_destroy(struct rcu_head *head)
 {
diff --git a/xen/include/asm-x86/debugger.h b/xen/include/asm-x86/debugger.h
index b1b627f1fa..f58726daec 100644
--- a/xen/include/asm-x86/debugger.h
+++ b/xen/include/asm-x86/debugger.h
@@ -33,6 +33,8 @@
 #include 
 #include 
 
+void domain_pause_for_debugger(void);
+
 #ifdef CONFIG_CRASH_DEBUG
 
 #include 
@@ -47,18 +49,6 @@ static inline bool debugger_trap_fatal(
 /* Int3 is a trivial way to gather cpu_user_regs context. */
 #define debugger_trap_immediate() __asm__ __volatile__ ( "int3" );
 
-#else
-
-static inline bool debugger_trap_fatal(
-unsigned int vector, struct cpu_user_regs *regs)
-{
-return false;
-}
-
-#define debugger_trap_immediate() ((void)0)
-
-#endif
-
 static inline bool debugger_trap_entry(
 unsigned int vector, struct cpu_user_regs *regs)
 {
@@ -84,6 +74,24 @@ static inline bool debugger_trap_entry(
 return false;
 }
 
+#else
+
+static inline bool debugger_trap_fatal(
+unsigned int vector, struct cpu_user_regs *regs)
+{
+return false;
+}
+
+#define debugger_trap_immediate() ((void)0)
+
+static inline bool debugger_trap_entry(
+unsigned int vector, struct cpu_user_regs *regs)
+{
+return false;
+}
+
+#endif
+
 unsigned int dbg_rw_mem(void * __user addr, void * __user buf,
 unsigned int len, domid_t domid, bool toaddr,
 uint64_t pgd3);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 9f7bc69293..0b41e936d5 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -652,7 +652,6 @@ void domain_destroy(struct domain *d);
 int domain_kill(struct domain *d);
 int domain_shutdown(struct domain *d, u8 reason);
 void domain_resume(struct domain *d);
-void domain_pause_for_debugger(void);
 
 int domain_soft_reset(struct domain *d);
 
-- 
2.16.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 0/2] xen: make more debugger support code conditional

2019-12-18 Thread Juergen Gross
Support for debugging the hypervisor of guests via gdb/gdbsx should be
configurable.

Changes in V3:
- remove possibility to access hypervisor memory via gdbsx domctl
- default gdbsx support to on
- some code moving

Changes in V2:
- split support for gdbstub and gdbsx (Andrew Cooper)

Juergen Gross (2):
  xen: put more code under CONFIG_CRASH_DEBUG
  xen: make gdbsx support configurable

 xen/Kconfig.debug   |  8 +
 xen/arch/x86/Kconfig|  1 -
 xen/arch/x86/Makefile   |  2 +-
 xen/arch/x86/debug.c| 78 +
 xen/arch/x86/domain.c   | 13 +++
 xen/arch/x86/domctl.c   |  4 +++
 xen/arch/x86/hvm/vmx/realmode.c |  1 +
 xen/common/Kconfig  |  3 --
 xen/common/domain.c | 14 
 xen/include/asm-x86/debugger.h  | 34 +++---
 xen/include/xen/sched.h |  1 -
 11 files changed, 58 insertions(+), 101 deletions(-)

-- 
2.16.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-mainline test] 144940: regressions - FAIL

2019-12-18 Thread osstest service owner
flight 144940 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144940/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-freebsd10-i386 14 guest-saverestore  fail REGR. vs. 144861
 test-amd64-i386-freebsd10-amd64 14 guest-saverestore fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 13 guest-saverestore fail REGR. vs. 
144861
 test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 guest-saverestore fail 
REGR. vs. 144861
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 guest-saverestore fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 13 guest-saverestore fail REGR. 
vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 13 guest-saverestore fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 13 guest-saverestore fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64 13 guest-saverestore fail REGR. vs. 
144861
 test-amd64-i386-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 13 guest-saverestore fail REGR. 
vs. 144861
 test-amd64-i386-xl-qemuu-win7-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ws16-amd64 13 guest-saverestore fail REGR. vs. 144861

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144861

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail  like 144861
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144861
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144861
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 

[Xen-devel] [ovmf test] 144957: all pass - PUSHED

2019-12-18 Thread osstest service owner
flight 144957 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144957/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf c7a0aca0ed0e9b51efe0c437ff77b30cf1457f8a
baseline version:
 ovmf 01b6090b75922bc72604c334bd3dc331490af3bb

Last test of basis   144927  2019-12-18 09:10:04 Z0 days
Testing same since   144957  2019-12-19 04:17:39 Z0 days1 attempts


People who touched revisions under test:
  Jiewen Yao 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   01b6090b75..c7a0aca0ed  c7a0aca0ed0e9b51efe0c437ff77b30cf1457f8a -> 
xen-tested-master

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable test] 144936: tolerable FAIL - PUSHED

2019-12-18 Thread osstest service owner
flight 144936 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144936/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 16 guest-localmigrate   fail REGR. vs. 144905
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144905

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144905
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144905
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144905
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144905
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144905
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144905
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144905
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144905
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144905
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 xen  0e7c69bd3c0b35a677d73843b39522787ccf5a3f
baseline version:
 xen  f50a4f6e244cfc8e773300c03aaf4db391f3028a

Last test of basis   144905  2019-12-17 18:36:21 Z1 days
Failing since144924  2019-12-18 06:43:35 Z0 days2 attempts
Testing same since   144936  2019-12-18 16:07:31 Z0 days1 attempts


People who touched revisions under 

Re: [Xen-devel] [PATCH v1] xen-pciback: optionally allow interrupt enable flag writes

2019-12-18 Thread Marek Marczykowski-Górecki
On Tue, Dec 03, 2019 at 04:17:33PM +0100, Roger Pau Monné wrote:
> On Tue, Dec 03, 2019 at 06:41:56AM +0100, Marek Marczykowski-Górecki wrote:
> > QEMU running in a stubdom needs to be able to set INTX_DISABLE, and the
> > MSI(-X) enable flags in the PCI config space. This adds an attribute
> > 'allow_interrupt_control' which when set for a PCI device allows writes
> > to this flag(s). The toolstack will need to set this for stubdoms.
> > When enabled, guest (stubdomain) will be allowed to set relevant enable
> > flags, but only one at a time - i.e. it refuses to enable more than one
> > of INTx, MSI, MSI-X at a time.
> > 
> > This functionality is needed only for config space access done by device
> > model (stubdomain) serving a HVM with the actual PCI device. It is not
> > necessary and unsafe to enable direct access to those bits for PV domain
> > with the device attached. For PV domains, there are separate protocol
> > messages (XEN_PCI_OP_{enable,disable}_{msi,msix}) for this purpose.
> > Those ops in addition to setting enable bits, also configure MSI(-X) in
> > dom0 kernel - which is undesirable for PCI passthrough to HVM guests.
> > 
> > This should not introduce any new security issues since a malicious
> > guest (or stubdom) can already generate MSIs through other ways, see
> > [1] page 8. Additionally, when qemu runs in dom0, it already have direct
> > access to those bits.
> > 
> > This is the second iteration of this feature. First was proposed as a
> > direct Xen interface through a new hypercall, but ultimately it was
> > rejected by the maintainer, because of mixing pciback and hypercalls for
> > PCI config space access isn't a good design. Full discussion at [2].
> > 
> > [1]: 
> > https://invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf
> > [2]: https://xen.markmail.org/thread/smpgpws4umdzizze
> > 
> > [part of the commit message and sysfs handling]
> > Signed-off-by: Simon Gaiser 
> > [the rest]
> > Signed-off-by: Marek Marczykowski-Górecki 
> > ---
> > I'm not very happy about code duplication regarding MSI/MSI-X/INTx
> > exclusivity test, but I don't have better ideas how to structure it. Any
> > suggestions?
> 
> Can't you create a helper that returns the currently enabled interrupt
> mode?
> 
> I expect returning an enum (ie: NONE, INTX, MSI, MSIX) should be fine
> since no two of those should be enabled at the same time.

Done in v2 (plus ERR member).

> 
> > ---
> >  .../xen/xen-pciback/conf_space_capability.c   | 113 ++
> >  drivers/xen/xen-pciback/conf_space_header.c   |  30 +
> >  drivers/xen/xen-pciback/pci_stub.c|  66 ++
> >  drivers/xen/xen-pciback/pciback.h |   1 +
> >  4 files changed, 210 insertions(+)
> > 
> > diff --git a/drivers/xen/xen-pciback/conf_space_capability.c 
> > b/drivers/xen/xen-pciback/conf_space_capability.c
> > index e5694133ebe5..c5a7c58ff3e3 100644
> > --- a/drivers/xen/xen-pciback/conf_space_capability.c
> > +++ b/drivers/xen/xen-pciback/conf_space_capability.c
> > @@ -189,6 +189,109 @@ static const struct config_field caplist_pm[] = {
> > {}
> >  };
> >  
> > +static struct msi_msix_field_config {
> > +   u16 enable_bit;  /* bit for enabling MSI/MSI-X */
> > +   int other_cap;  /* the other capability for exclusiveness check */
> 
> Nit: just one space between the declaration and the comment IMO.
> 
> Also capability ID is not a signed value, hence unsigned int would
> feel more natural.

Replaced with enum in v2.

> > +} msi_field_config = {
> > +   .enable_bit = PCI_MSI_FLAGS_ENABLE,
> > +   .other_cap = PCI_CAP_ID_MSIX,
> > +}, msix_field_config = {
> > +   .enable_bit = PCI_MSIX_FLAGS_ENABLE,
> > +   .other_cap = PCI_CAP_ID_MSI,
> > +};
> 
> I think it would be more helpful to store the current capability ID
> rather the one you need to check against. Then if you had a helper
> that returns the currently enabled interrupt mode you would have to
> check that either it's NONE or matches the capability requested to be
> enabled.

Done in v2.

> > +
> > +static void *msi_field_init(struct pci_dev *dev, int offset)
> > +{
> > +   return _field_config;
> > +}
> > +
> > +static void *msix_field_init(struct pci_dev *dev, int offset)
> > +{
> > +   return _field_config;
> > +}
> > +
> > +static int msi_msix_flags_write(struct pci_dev *dev, int offset, u16 
> > new_value,
> > +void *data)
> > +{
> > +   int err;
> > +   u16 old_value;
> > +   struct msi_msix_field_config *field_config = data;
> > +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(dev);
> 
> const for both the above.

Done in v2.

> > +   int other_cap_offset;
> 
> unsigned int

Done in v2.

> > +   u16 other_cap_enable_bit;
> > +   u16 other_cap_value;
> > +
> > +   if (xen_pcibk_permissive || dev_data->permissive)
> > +   goto write;
> > +
> > +   err = pci_read_config_word(dev, offset, _value);
> > +   if (err)
> > +   return err;
> > +
> > +   if (new_value 

[Xen-devel] [PATCH v2] xen-pciback: optionally allow interrupt enable flag writes

2019-12-18 Thread Marek Marczykowski-Górecki
QEMU running in a stubdom needs to be able to set INTX_DISABLE, and the
MSI(-X) enable flags in the PCI config space. This adds an attribute
'allow_interrupt_control' which when set for a PCI device allows writes
to this flag(s). The toolstack will need to set this for stubdoms.
When enabled, guest (stubdomain) will be allowed to set relevant enable
flags, but only one at a time - i.e. it refuses to enable more than one
of INTx, MSI, MSI-X at a time.

This functionality is needed only for config space access done by device
model (stubdomain) serving a HVM with the actual PCI device. It is not
necessary and unsafe to enable direct access to those bits for PV domain
with the device attached. For PV domains, there are separate protocol
messages (XEN_PCI_OP_{enable,disable}_{msi,msix}) for this purpose.
Those ops in addition to setting enable bits, also configure MSI(-X) in
dom0 kernel - which is undesirable for PCI passthrough to HVM guests.

This should not introduce any new security issues since a malicious
guest (or stubdom) can already generate MSIs through other ways, see
[1] page 8. Additionally, when qemu runs in dom0, it already have direct
access to those bits.

This is the second iteration of this feature. First was proposed as a
direct Xen interface through a new hypercall, but ultimately it was
rejected by the maintainer, because of mixing pciback and hypercalls for
PCI config space access isn't a good design. Full discussion at [2].

[1]: 
https://invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf
[2]: https://xen.markmail.org/thread/smpgpws4umdzizze

[part of the commit message and sysfs handling]
Signed-off-by: Simon Gaiser 
[the rest]
Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v2:
 - introduce xen_pcibk_get_interrupt_type() to deduplicate current
   INTx/MSI/MSI-X state check
 - fix checking MSI/MSI-X state on devices not supporting it
---
 drivers/xen/xen-pciback/conf_space.c  | 35 
 drivers/xen/xen-pciback/conf_space.h  | 10 +++
 .../xen/xen-pciback/conf_space_capability.c   | 88 +++
 drivers/xen/xen-pciback/conf_space_header.c   | 19 
 drivers/xen/xen-pciback/pci_stub.c| 66 ++
 drivers/xen/xen-pciback/pciback.h |  1 +
 6 files changed, 219 insertions(+)

diff --git a/drivers/xen/xen-pciback/conf_space.c 
b/drivers/xen/xen-pciback/conf_space.c
index 60111719b01f..10200a7a2da5 100644
--- a/drivers/xen/xen-pciback/conf_space.c
+++ b/drivers/xen/xen-pciback/conf_space.c
@@ -286,6 +286,41 @@ int xen_pcibk_config_write(struct pci_dev *dev, int 
offset, int size, u32 value)
return xen_pcibios_err_to_errno(err);
 }
 
+enum interrupt_type xen_pcibk_get_interrupt_type(struct pci_dev *dev)
+{
+   int err;
+   u16 val;
+
+   err = pci_read_config_word(dev, PCI_COMMAND, );
+   if (err)
+   return INTERRUPT_TYPE_ERR;
+   if (!(val & PCI_COMMAND_INTX_DISABLE))
+   return INTERRUPT_TYPE_INTX;
+
+   /* Do not trust dev->msi(x)_enabled here, as enabling could be done
+* bypassing the pci_*msi* functions, by the qemu.
+*/
+   if (dev->msi_cap) {
+   err = pci_read_config_word(dev,
+   dev->msi_cap + PCI_MSI_FLAGS,
+   );
+   if (err)
+   return INTERRUPT_TYPE_ERR;
+   if (val & PCI_MSI_FLAGS_ENABLE)
+   return INTERRUPT_TYPE_MSI;
+   }
+   if (dev->msix_cap) {
+   err = pci_read_config_word(dev,
+   dev->msix_cap + PCI_MSIX_FLAGS,
+   );
+   if (err)
+   return INTERRUPT_TYPE_ERR;
+   if (val & PCI_MSIX_FLAGS_ENABLE)
+   return INTERRUPT_TYPE_MSIX;
+   }
+   return INTERRUPT_TYPE_NONE;
+}
+
 void xen_pcibk_config_free_dyn_fields(struct pci_dev *dev)
 {
struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(dev);
diff --git a/drivers/xen/xen-pciback/conf_space.h 
b/drivers/xen/xen-pciback/conf_space.h
index 22db630717ea..b6fff5161331 100644
--- a/drivers/xen/xen-pciback/conf_space.h
+++ b/drivers/xen/xen-pciback/conf_space.h
@@ -65,6 +65,14 @@ struct config_field_entry {
void *data;
 };
 
+enum interrupt_type {
+INTERRUPT_TYPE_ERR = -1,
+INTERRUPT_TYPE_NONE,
+INTERRUPT_TYPE_INTX,
+INTERRUPT_TYPE_MSI,
+INTERRUPT_TYPE_MSIX,
+};
+
 extern bool xen_pcibk_permissive;
 
 #define OFFSET(cfg_entry) ((cfg_entry)->base_offset+(cfg_entry)->field->offset)
@@ -126,4 +134,6 @@ int xen_pcibk_config_capability_init(void);
 int xen_pcibk_config_header_add_fields(struct pci_dev *dev);
 int xen_pcibk_config_capability_add_fields(struct pci_dev *dev);
 
+enum interrupt_type xen_pcibk_get_interrupt_type(struct pci_dev *dev);
+
 #endif /* __XEN_PCIBACK_CONF_SPACE_H__ */
diff --git 

Re: [Xen-devel] [PATCH] [tools/hotplug] Use ip on systems where brctl is not available

2019-12-18 Thread Steven Haigh

On 2019-12-19 02:42, Ian Jackson wrote:

Steven Haigh writes ("[PATCH] [tools/hotplug] Use ip on systems where
brctl is not available"):

Newer distros like CentOS 8 do not have brctl available. As such, we
can't use it to configure networking anymore.

This patch will fall back to 'ip' or 'bridge' commands if brctl is not
available in the working PATH.


This looks good to me at least in the brctl case.  I have two minor
comments.

For the avoidance of doubt, I guess you have tested this in the
`ip'/`bridge' case ?  How thoroughly ? :-)


I have tested it to the point that it's almost a port of the Fedora 
patch - however the Fedora patch removes brctl completely in favour of 
the ip / bridge commands. While I haven't specifically debugged the 
result on Fedora, the networking works successfully when running a 
Domain-0 in Fedora 31 - which was the source of the 'ip' commands to 
run.





-if [ -z "$bridge" ]
-then
-  bridge=$(brctl show | awk 'NR==2{print$1}')
-
+if [ -z "$bridge" ]; then


The presumably-unintentional style change makes the review slightly
harder...


I'm intending to submit a new patch series after this (to make 
backporting this easier) that cleans up formatting / whitespace / syntax 
across the majority of scripts in the Linux directory. It'll look like a 
hot mess when submitting the next lot of patches - but its better than 
nothing.



-bridge=$(brctl show | cut -d "
+if which brctl >&/dev/null; then


Maybe introduce
   have_brctl () { ... }
so we can say
   if have_brctl; then
?


I don't really have a preference. brctl is used through quite a few 
scripts - none of which really have a standard method of operation or 
common presentation. Some scripts call xen-network-common.sh - some do 
not.


Would I be correct in thinking that your proposal would be to ensure all 
network scripts source xen-network-common.sh - but this would be a more 
invasive change for backporting - hence I've tried to keep it as simple 
as possible for now.


Would a restructure of these things be better for something to be 
committed as yet another patch set (after formatting/style cleanups) 
that makes things a little more consistent?


--
Steven Haigh

? net...@crc.id.au ? https://www.crc.id.au

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM

2019-12-18 Thread Roman Shaposhnik
Hi Julien! First of all -- thank you so much for detailed explanations
-- this is very much appreciated.

A few questions still (if you don't mind):

On Wed, Dec 18, 2019 at 2:17 PM Julien Grall  wrote:
>
> Hi Roman,
>
> On 18/12/2019 17:03, Roman Shaposhnik wrote:
> > On Wed, Dec 18, 2019 at 3:50 AM Julien Grall  wrote:
> > So -- nothing boots directly by UEFI -- everything goes through GRUB.
> >
> > However, my understanding is that GRUB will detect devicetree
> > information provided by UEFI (even though devicetree command is
> > supposed to completely replace that). Hence it is possible that Linux
> > relies on some residuals left in memory by GRUB that Xen doesn't pay
> > attention to (but this is a pretty wild speculation only).
>
> While it goes through GRUB, it is a bootloader and will just act as a
> proxy for EFI. So EFI application such as Xen/Linux can still be loaded
> and take advantage of runtime servies if present/implemented.

Aha! So then it depends on Xen actually using those EFI services. Which
leads to my first question:
   1. would it be possible to stay completely with just devicetrees information
   by passing efi=no-rs to Xen?

> In fact most of people on Arm are using GRUB rather than EFI directly as
> this is more friendly to use.
>
> Regarding the devicetree, Xen and Linux will completely ignore the
> memory nodes in Xen if using EFI. This because the EFI memory map will
> give you an overview of the platform with the EFI regions included.

Aha! So in that sense it is a bug in Xen after all, right? (that's what you're
referring to when you say you now understand what needs to get fixed).

Thanks,
Roman.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork

2019-12-18 Thread Tamas K Lengyel
On Wed, Dec 18, 2019 at 4:02 PM Julien Grall  wrote:
>
> Hi,
>
> On 18/12/2019 22:33, Tamas K Lengyel wrote:
> > On Wed, Dec 18, 2019 at 3:00 PM Julien Grall  wrote:
> >>
> >> Hi Tamas,
> >>
> >> On 18/12/2019 19:40, Tamas K Lengyel wrote:
> >>> Implement hypercall that allows a fork to shed all memory that got 
> >>> allocated
> >>> for it during its execution and re-load its vCPU context from the parent 
> >>> VM.
> >>> This allows the forked VM to reset into the same state the parent VM is 
> >>> in a
> >>> faster way then creating a new fork would be. Measurements show about a 2x
> >>> speedup during normal fuzzing operations. Performance may vary depending 
> >>> how
> >>> much memory got allocated for the forked VM. If it has been completely
> >>> deduplicated from the parent VM then creating a new fork would likely be 
> >>> more
> >>> performant.
> >>>
> >>> Signed-off-by: Tamas K Lengyel 
> >>> ---
> >>>xen/arch/x86/mm/mem_sharing.c | 105 ++
> >>>xen/include/public/memory.h   |   1 +
> >>>2 files changed, 106 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> >>> index e93ad2ec5a..4735a334b9 100644
> >>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>> @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, 
> >>> struct domain *cd)
> >>>return 0;
> >>>}
> >>>
> >>> +struct gfn_free;
> >>> +struct gfn_free {
> >>> +struct gfn_free *next;
> >>> +struct page_info *page;
> >>> +gfn_t gfn;
> >>> +};
> >>> +
> >>> +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
> >>> +{
> >>> +int rc;
> >>> +
> >>> +struct p2m_domain* p2m = p2m_get_hostp2m(cd);
> >>> +struct gfn_free *list = NULL;
> >>> +struct page_info *page;
> >>> +
> >>> +page_list_for_each(page, >page_list)
> >>
> >> AFAICT, your domain is not paused, so it would be possible to have page
> >> added/remove in that list behind your back.
> >
> > Well, it's not that it's not paused, it's just that I haven't added a
> > sanity check to make sure it is. The toolstack can (and should) pause
> > it, so that sanity check would be warranted.
> I have only read the hypervisor part, so I didn't know what the
> toolstack has done.

I've added the same enforced VM paused operation that is present for
the fork hypercall handler.

>
> >
> >>
> >> You also have multiple loop on the page_list in this function. Given the
> >> number of page_list can be quite big, this is a call for hogging the
> >> pCPU and an RCU lock on the domain vCPU running this call.
> >
> > There is just one loop over page_list itself, the second loop is on
> > the internal list that is being built here which will be a subset. The
> > list itself in fact should be small (in our tests usually <100).
>
> For a first, nothing in this function tells me that there will be only
> 100 pages. But then, I don't think this is right to implement your
> hypercall based only the  "normal" scenario. You should also think about
> the "worst" case scenario.
>
> In this case the worst case scenario is have hundreds of page in page_list.

Well, this is only an experimental system that's completely disabled
by default. Making the assumption that people who make use of it will
know what they are doing I think is fair.

>
> > Granted the list can grow larger, but in those cases its likely better
> > to just discard the fork and create a new one. So in my opinion adding
> > a hypercall continuation to this not needed
>
> How would the caller know it? What would happen if the caller ends up to
> call this with a growing list.

The caller knows by virtue of knowing how long the VM was executed
for. In the usecase this is targeted at the VM was executing only for
a couple seconds at most. Usually much less then that (we get about
~80 resets/s with AFL). During that time its extremely unlikely you
get more then a ~100 pages deduplicated (that is, written to). But
even if there are more pages, it just means the hypercall might take a
bit longer to run for that iteration. I don't see any issue with not
breaking up this hypercall with continuation even under the worst case
situation though. But if others feel that strongly as well about
having to have continuation for this I don't really mind adding it.

>
> >
> >>
> >>> +{
> >>> +mfn_t mfn = page_to_mfn(page);
> >>> +if ( mfn_valid(mfn) )
> >>> +{
> >>> +p2m_type_t p2mt;
> >>> +p2m_access_t p2ma;
> >>> +gfn_t gfn = mfn_to_gfn(cd, mfn);
> >>> +mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , ,
> >>> +0, NULL, false);
> >>> +if ( p2m_is_ram(p2mt) )
> >>> +{
> >>> +struct gfn_free *gfn_free;
> >>> +if ( !get_page(page, cd) )
> >>> +goto err_reset;
> >>> +
> >>> +   

Re: [Xen-devel] [PATCH] arm64: xen: Use modern annotations for assembly functions

2019-12-18 Thread Stefano Stabellini
On Wed, 18 Dec 2019, Mark Brown wrote:
> In an effort to clarify and simplify the annotation of assembly functions
> in the kernel new macros have been introduced. These replace ENTRY and
> ENDPROC. Update the annotations in the xen code to the new macros.
> 
> Signed-off-by: Mark Brown 
> ---
> 
> This is part of a wider effort to convert all the arch/arm64 code.
> 
>  arch/arm64/xen/hypercall.S | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/xen/hypercall.S b/arch/arm64/xen/hypercall.S
> index c5f05c4a4d00..305c2274b8eb 100644
> --- a/arch/arm64/xen/hypercall.S
> +++ b/arch/arm64/xen/hypercall.S
> @@ -56,11 +56,11 @@
>  #define XEN_IMM 0xEA1
>  
>  #define HYPERCALL_SIMPLE(hypercall)  \
> -ENTRY(HYPERVISOR_##hypercall)\
> +SYM_FUNC_START(HYPERVISOR_##hypercall)   \

Could you please adjust the tabs so that the '\' is aligned with the
others?

With that change:

Reviewed-by: Stefano Stabellini 


>   mov x16, #__HYPERVISOR_##hypercall; \
>   hvc XEN_IMM;\
>   ret;\
> -ENDPROC(HYPERVISOR_##hypercall)
> +SYM_FUNC_END(HYPERVISOR_##hypercall)
>  
>  #define HYPERCALL0 HYPERCALL_SIMPLE
>  #define HYPERCALL1 HYPERCALL_SIMPLE
> @@ -86,7 +86,7 @@ HYPERCALL2(multicall);
>  HYPERCALL2(vm_assist);
>  HYPERCALL3(dm_op);
>  
> -ENTRY(privcmd_call)
> +SYM_FUNC_START(privcmd_call)
>   mov x16, x0
>   mov x0, x1
>   mov x1, x2
> @@ -109,4 +109,4 @@ ENTRY(privcmd_call)
>*/
>   uaccess_ttbr0_disable x6, x7
>   ret
> -ENDPROC(privcmd_call);
> +SYM_FUNC_END(privcmd_call);
> -- 
> 2.20.1
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork

2019-12-18 Thread Julien Grall

Hi,

On 18/12/2019 22:33, Tamas K Lengyel wrote:

On Wed, Dec 18, 2019 at 3:00 PM Julien Grall  wrote:


Hi Tamas,

On 18/12/2019 19:40, Tamas K Lengyel wrote:

Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel 
---
   xen/arch/x86/mm/mem_sharing.c | 105 ++
   xen/include/public/memory.h   |   1 +
   2 files changed, 106 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index e93ad2ec5a..4735a334b9 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct 
domain *cd)
   return 0;
   }

+struct gfn_free;
+struct gfn_free {
+struct gfn_free *next;
+struct page_info *page;
+gfn_t gfn;
+};
+
+static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
+{
+int rc;
+
+struct p2m_domain* p2m = p2m_get_hostp2m(cd);
+struct gfn_free *list = NULL;
+struct page_info *page;
+
+page_list_for_each(page, >page_list)


AFAICT, your domain is not paused, so it would be possible to have page
added/remove in that list behind your back.


Well, it's not that it's not paused, it's just that I haven't added a
sanity check to make sure it is. The toolstack can (and should) pause
it, so that sanity check would be warranted.
I have only read the hypervisor part, so I didn't know what the 
toolstack has done.






You also have multiple loop on the page_list in this function. Given the
number of page_list can be quite big, this is a call for hogging the
pCPU and an RCU lock on the domain vCPU running this call.


There is just one loop over page_list itself, the second loop is on
the internal list that is being built here which will be a subset. The
list itself in fact should be small (in our tests usually <100).


For a first, nothing in this function tells me that there will be only 
100 pages. But then, I don't think this is right to implement your 
hypercall based only the  "normal" scenario. You should also think about 
the "worst" case scenario.


In this case the worst case scenario is have hundreds of page in page_list.


Granted the list can grow larger, but in those cases its likely better
to just discard the fork and create a new one. So in my opinion adding
a hypercall continuation to this not needed


How would the caller know it? What would happen if the caller ends up to 
call this with a growing list.







+{
+mfn_t mfn = page_to_mfn(page);
+if ( mfn_valid(mfn) )
+{
+p2m_type_t p2mt;
+p2m_access_t p2ma;
+gfn_t gfn = mfn_to_gfn(cd, mfn);
+mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , ,
+0, NULL, false);
+if ( p2m_is_ram(p2mt) )
+{
+struct gfn_free *gfn_free;
+if ( !get_page(page, cd) )
+goto err_reset;
+
+/*
+ * We can't free the page while iterating over the page_list
+ * so we build a separate list to loop over.
+ *
+ * We want to iterate over the page_list instead of checking
+ * gfn from 0 to max_gfn because this is ~10x faster.
+ */
+gfn_free = xmalloc(struct gfn_free);


If I did the math right, for a 4G guest this will require at ~24MB of
memory. Actually, is it really necessary to do the allocation for a
short period of time?


If you have a fully deduplicated fork then you should not be using
this function to begin with. You get better performance my throwing
that one away and creating a new one.


How a user knows when/how this can be called? But then, as said above, 
this may be called by mistake... So I still think you need to be prepare 
for the worst case.



As for using xmalloc here, I'm
not sure what other way I have to build a list of pages that need to
be freed. I can't free the page itself while I'm iterating on
page_list (that I'm aware of). The only other option available is
calling __get_gfn_type_access with gfn=0..max_gfn which will be
extremely slow because you have to loop over a lot of holes.
You can use page_list_for_each_safe(). This is already used by function 
such as relinquish_memory().






What are you trying to achieve by iterating twice on the GFN? Wouldn't
it be easier to pause the domain?


I'm not sure what you mean, where do 

Re: [Xen-devel] [PATCH] tools/python: Python 3 compatibility

2019-12-18 Thread Marek Marczykowski-Górecki
On Wed, Dec 18, 2019 at 10:32:47PM +, Andrew Cooper wrote:
> On 18/12/2019 22:26, Marek Marczykowski-Górecki wrote:
> >> @@ -70,7 +73,7 @@ class VM(object):
> >>  
> >>  # libxl
> >>  self.libxl = fmt == "libxl"
> >> -self.emu_xenstore = "" # NUL terminated key pairs from 
> >> "toolstack" records
> >> +self.emu_xenstore = b"" # NUL terminated key pairs from 
> >> "toolstack" records
> >>  
> >>  def write_libxc_ihdr():
> >>  stream_write(pack(libxc.IHDR_FORMAT,
> > You also need to update write_record (string constants).
> > And few calls to it with string constants (write_libxl_end,
> > write_libxl_libxc_context, read_pv_tail, read_hvm_tail).
> > And blkid == ... in read_pv_extended_info().
> 
> Urgh - well spotted.
> 
> Was this manual inspection, or something else? 

Manual search for " and '.

> (I probably should
> complete and upstream write-legacy-stream for the purpose of dev-testing
> the convert-legacy-stream script now that 4.6 is waaay in the past.)
> 
> ~Andrew

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM

2019-12-18 Thread Julien Grall

Hi,

On 18/12/2019 17:09, Roman Shaposhnik wrote:

Hi,

On Wed, Dec 18, 2019 at 4:56 AM Julien Grall  wrote:

So that is, in fact, my first question -- why is Xen not showing
available memory in xl info?


I am not entirely sure what exact information you want.

The output you dumped above contain the available memory for the memory
(see "free_memory").

Are you looking from something different?


Just to be clear: I was giving 2G via devicetrees (the same device
trees that would
make Linux detect 2G of RAM) hence I was expecting xl info to show that. Instead
I only got 1120M shown by xl info.


On 18/12/2019 00:04, Roman Shaposhnik wrote:

  memory {
  device_type = "memory";
  reg = <0x0 0x0 0x0 0x5e0 0x0 0x5f0 0x0 0x1000
0x0 0x5f02000 0x0 0xefd000 0x0 0x6e0 0x0 0x60f000 0x0 0x741
0x0 0x1aaf 0x0 0x21f0 0x0 0x10 0x0 0x2200 0x0
0x1c00>;
  };

  reserved-memory {
  ranges;
  #size-cells = <0x2>;
  #address-cells = <0x2>;

  ramoops@21f0 {
  ftrace-size = <0x2>;
  console-size = <0x2>;
  reg = <0x0 0x21f0 0x0 0x10>;
  record-size = <0x2>;
  compatible = "ramoops";
  };

  linux,cma {
  linux,cma-default;
  reusable;
  size = <0x0 0x800>;
  compatible = "shared-dma-pool";
  };
  };

If you look at the REG -- it does now add up to 2Gb, but booting Xen
with it has exactly the
same effect as booting it with: reg = <0x0 0x0 0x0 0x8000>;\


If you boot Xen using EFI, the memory information wil come from EFI and
the DT node will be ignored. So unless UEFI is able to pick up the
modification of the DT memory node, modifying the DT is not going to
affect anything.


That's a good point, but given that I always go through GRUB, I was
expecting devicetree command to completely overshadow whatever
information UEFI may have. Am I wrong?


GRUB will load Xen/Linux as an EFI application. Both of them will ignore 
the memory nodes when booting using EFI. For more details, see the 
answer I wrote separately.






I am attaching a full log, and I see the following in the logs:

(XEN) Allocating 1:1 mappings totalling 720MB for dom0:
(XEN) BANK[0] 0x000800-0x001c00 (320MB)
(XEN) BANK[1] 0x004000-0x005800 (384MB)
(XEN) BANK[2] 0x007b00-0x007c00 (16MB)

Which sort of makes sense, I guess -- but I still don't understand
where all these ranges
are coming from and how come Xen doesn't see the full 2Gb even with various
devicetrees I tried.


The range aboves describe the memory range given to Dom0. For all the
memory given to Xen,m you want to look at the top of your log:

(XEN) Checking for initrd in /chosen
(XEN) RAM:  - 05df
(XEN) RAM: 05f0 - 06dfefff
(XEN) RAM: 06e0 - 0740efff
(XEN) RAM: 0741 - 1db8dfff
(XEN) RAM: 350f - 3dbd2fff
(XEN) RAM: 3dbd3000 - 3dff
(XEN) RAM: 4000 - 5a653fff
(XEN) RAM: 7ada - 7ada3fff
(XEN) RAM: 7aea8000 - 7afa9fff
(XEN) RAM: 7afaa000 - 7ec73fff
(XEN) RAM: 7ec74000 - 7fdddfff
(XEN) RAM: 7fdde000 - 7fea5fff
(XEN) RAM: 7fea6000 - 7ff6dfff
(XEN) RAM: 7000 - 7fff

Looking at the differences with the Linux logs, there is indeed some
memory not detected by Xen.

On Xen, we only consider usuable memory any EFI description with
EfiConventionalMemory, EfiBootServicesCode and EfiBootServicesData.

Linux include more type here, so this may explain why we see a difference.

While Looking at it, I have also noticed that we don't seem to care
about the memory attribute. I suspect this could be another latent issue
in Xen if the attribute does not match.


Anything I can do to help debug this? I can run any kind of debug builds, etc.
if needed.


Thank you for the offer, I think I have a good understanding of the 
problem now. So debug should not be necessary.


However, I would appreciate if anyone could help to write a patch for it.



I mean -- at this point it would be really great to get HiKey back to the status
of Xen-on-ARM developer board.


Any ideas here would be greatly apprecaited!

Thanks,
Roman.

P.S. Any guess at what these mean?

(XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x008738
gva=0x872f2000 gpa=0x0f
(XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x00b734e558
gva=0xb72eb000 gpa=0x0f
(XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x008f9d2558
gva=0x8f96f000 gpa=0x0f


It means 

Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork

2019-12-18 Thread Tamas K Lengyel
On Wed, Dec 18, 2019 at 3:00 PM Julien Grall  wrote:
>
> Hi Tamas,
>
> On 18/12/2019 19:40, Tamas K Lengyel wrote:
> > Implement hypercall that allows a fork to shed all memory that got allocated
> > for it during its execution and re-load its vCPU context from the parent VM.
> > This allows the forked VM to reset into the same state the parent VM is in a
> > faster way then creating a new fork would be. Measurements show about a 2x
> > speedup during normal fuzzing operations. Performance may vary depending how
> > much memory got allocated for the forked VM. If it has been completely
> > deduplicated from the parent VM then creating a new fork would likely be 
> > more
> > performant.
> >
> > Signed-off-by: Tamas K Lengyel 
> > ---
> >   xen/arch/x86/mm/mem_sharing.c | 105 ++
> >   xen/include/public/memory.h   |   1 +
> >   2 files changed, 106 insertions(+)
> >
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index e93ad2ec5a..4735a334b9 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct 
> > domain *cd)
> >   return 0;
> >   }
> >
> > +struct gfn_free;
> > +struct gfn_free {
> > +struct gfn_free *next;
> > +struct page_info *page;
> > +gfn_t gfn;
> > +};
> > +
> > +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
> > +{
> > +int rc;
> > +
> > +struct p2m_domain* p2m = p2m_get_hostp2m(cd);
> > +struct gfn_free *list = NULL;
> > +struct page_info *page;
> > +
> > +page_list_for_each(page, >page_list)
>
> AFAICT, your domain is not paused, so it would be possible to have page
> added/remove in that list behind your back.

Well, it's not that it's not paused, it's just that I haven't added a
sanity check to make sure it is. The toolstack can (and should) pause
it, so that sanity check would be warranted.

>
> You also have multiple loop on the page_list in this function. Given the
> number of page_list can be quite big, this is a call for hogging the
> pCPU and an RCU lock on the domain vCPU running this call.

There is just one loop over page_list itself, the second loop is on
the internal list that is being built here which will be a subset. The
list itself in fact should be small (in our tests usually <100).
Granted the list can grow larger, but in those cases its likely better
to just discard the fork and create a new one. So in my opinion adding
a hypercall continuation to this not needed.

>
> > +{
> > +mfn_t mfn = page_to_mfn(page);
> > +if ( mfn_valid(mfn) )
> > +{
> > +p2m_type_t p2mt;
> > +p2m_access_t p2ma;
> > +gfn_t gfn = mfn_to_gfn(cd, mfn);
> > +mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , ,
> > +0, NULL, false);
> > +if ( p2m_is_ram(p2mt) )
> > +{
> > +struct gfn_free *gfn_free;
> > +if ( !get_page(page, cd) )
> > +goto err_reset;
> > +
> > +/*
> > + * We can't free the page while iterating over the 
> > page_list
> > + * so we build a separate list to loop over.
> > + *
> > + * We want to iterate over the page_list instead of 
> > checking
> > + * gfn from 0 to max_gfn because this is ~10x faster.
> > + */
> > +gfn_free = xmalloc(struct gfn_free);
>
> If I did the math right, for a 4G guest this will require at ~24MB of
> memory. Actually, is it really necessary to do the allocation for a
> short period of time?

If you have a fully deduplicated fork then you should not be using
this function to begin with. You get better performance my throwing
that one away and creating a new one. As for using xmalloc here, I'm
not sure what other way I have to build a list of pages that need to
be freed. I can't free the page itself while I'm iterating on
page_list (that I'm aware of). The only other option available is
calling __get_gfn_type_access with gfn=0..max_gfn which will be
extremely slow because you have to loop over a lot of holes.

>
> What are you trying to achieve by iterating twice on the GFN? Wouldn't
> it be easier to pause the domain?

I'm not sure what you mean, where do you see me iterating twice on the
gfn? And what does pausing have to do with it?

Than

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/python: Python 3 compatibility

2019-12-18 Thread Andrew Cooper
On 18/12/2019 22:26, Marek Marczykowski-Górecki wrote:
>> @@ -70,7 +73,7 @@ class VM(object):
>>  
>>  # libxl
>>  self.libxl = fmt == "libxl"
>> -self.emu_xenstore = "" # NUL terminated key pairs from 
>> "toolstack" records
>> +self.emu_xenstore = b"" # NUL terminated key pairs from 
>> "toolstack" records
>>  
>>  def write_libxc_ihdr():
>>  stream_write(pack(libxc.IHDR_FORMAT,
> You also need to update write_record (string constants).
> And few calls to it with string constants (write_libxl_end,
> write_libxl_libxc_context, read_pv_tail, read_hvm_tail).
> And blkid == ... in read_pv_extended_info().

Urgh - well spotted.

Was this manual inspection, or something else?  (I probably should
complete and upstream write-legacy-stream for the purpose of dev-testing
the convert-legacy-stream script now that 4.6 is waaay in the past.)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/python: Python 3 compatibility

2019-12-18 Thread Marek Marczykowski-Górecki
On Wed, Dec 18, 2019 at 03:05:22PM +, Andrew Cooper wrote:
> convert-legacy-stream is only used for incomming migration from pre Xen 4.7,
> and verify-stream-v2 appears to only be used by me during migration
> development - it is little surprise that they missed the main converstion
> effort in Xen 4.13.
> 
> Fix it all up.
> 
> Move open_file_or_fd() into a new util.py to avoid duplication, making it a
> more generic wrapper around open() or fdopen().
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Ian Jackson 
> CC: Wei Liu 
> 
> This needs backporting to 4.13 ASAP
> ---
>  tools/python/scripts/convert-legacy-stream | 49 
> +++---
>  tools/python/scripts/verify-stream-v2  | 43 +-
>  tools/python/xen/migration/libxc.py|  2 +-
>  tools/python/xen/migration/libxl.py|  2 +-
>  tools/python/xen/migration/verify.py   |  4 +--
>  tools/python/xen/util.py   | 23 ++
>  6 files changed, 46 insertions(+), 77 deletions(-)
>  create mode 100644 tools/python/xen/util.py
> 
> diff --git a/tools/python/scripts/convert-legacy-stream 
> b/tools/python/scripts/convert-legacy-stream
> index 5f80f13654..b0d81aa92e 100755
> --- a/tools/python/scripts/convert-legacy-stream
> +++ b/tools/python/scripts/convert-legacy-stream
> @@ -5,6 +5,8 @@
>  Convert a legacy migration stream to a v2 stream.
>  """
>  
> +from __future__ import print_function
> +
>  import sys
>  import os, os.path
>  import syslog
> @@ -12,6 +14,7 @@ import traceback
>  
>  from struct import calcsize, unpack, pack
>  
> +from xen.util import open_file_or_fd as open_file_or_fd
>  from xen.migration import legacy, public, libxc, libxl, xl
>  
>  __version__ = 1
> @@ -39,16 +42,16 @@ def info(msg):
>  for line in msg.split("\n"):
>  syslog.syslog(syslog.LOG_INFO, line)
>  else:
> -print msg
> +print(msg)
>  
>  def err(msg):
>  """Error message, routed to appropriate destination"""
>  if log_to_syslog:
>  for line in msg.split("\n"):
>  syslog.syslog(syslog.LOG_ERR, line)
> -print >> sys.stderr, msg
> +print(msg, file = sys.stderr)
>  
> -class StreamError(StandardError):
> +class StreamError(Exception):
>  """Error with the incoming migration stream"""
>  pass
>  
> @@ -70,7 +73,7 @@ class VM(object):
>  
>  # libxl
>  self.libxl = fmt == "libxl"
> -self.emu_xenstore = "" # NUL terminated key pairs from 
> "toolstack" records
> +self.emu_xenstore = b"" # NUL terminated key pairs from 
> "toolstack" records
>  
>  def write_libxc_ihdr():
>  stream_write(pack(libxc.IHDR_FORMAT,

You also need to update write_record (string constants).
And few calls to it with string constants (write_libxl_end,
write_libxl_libxc_context, read_pv_tail, read_hvm_tail).
And blkid == ... in read_pv_extended_info().

> @@ -336,7 +339,7 @@ def read_libxl_toolstack(vm, data):
>  if twidth == 64:
>  name = name[:-4]
>  
> -if name[-1] != '\x00':
> +if name[-1] != b'\x00':
>  raise StreamError("physmap name not NUL terminated")
>  
>  root = "physmap/%x" % (phys,)
> @@ -347,7 +350,7 @@ def read_libxl_toolstack(vm, data):
>  for key, val in zip(kv[0::2], kv[1::2]):
>  info("'%s' = '%s'" % (key, val))
>  
> -vm.emu_xenstore += '\x00'.join(kv) + '\x00'
> +vm.emu_xenstore += b'\x00'.join(kv) + b'\x00'
>  
>  
>  def read_chunks(vm):
> @@ -534,7 +537,7 @@ def read_qemu(vm):
>  sig, = unpack("21s", rawsig)
>  info("Qemu signature: %s" % (sig, ))
>  
> -if sig == "DeviceModelRecord0002":
> +if sig == b"DeviceModelRecord0002":
>  rawsz = rdexact(4)
>  sz, = unpack("I", rawsz)
>  qdata = rdexact(sz)
> @@ -617,36 +620,6 @@ def read_legacy_stream(vm):
>  return 2
>  return 0
>  
> -def open_file_or_fd(val, mode):
> -"""
> -If 'val' looks like a decimal integer, open it as an fd.  If not, try to
> -open it as a regular file.
> -"""
> -
> -fd = -1
> -try:
> -# Does it look like an integer?
> -try:
> -fd = int(val, 10)
> -except ValueError:
> -pass
> -
> -# Try to open it...
> -if fd != -1:
> -return os.fdopen(fd, mode, 0)
> -else:
> -return open(val, mode, 0)
> -
> -except StandardError, e:
> -if fd != -1:
> -err("Unable to open fd %d: %s: %s" %
> -(fd, e.__class__.__name__, e))
> -else:
> -err("Unable to open file '%s': %s: %s" %
> -(val, e.__class__.__name__, e))
> -
> -raise SystemExit(1)
> -
>  
>  def main():
>  from optparse import OptionParser
> @@ -723,7 +696,7 @@ def main():
>  if __name__ == "__main__":
>  try:
>  sys.exit(main())
> -except SystemExit, e:
> +except SystemExit as e:
>

Re: [Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source

2019-12-18 Thread Wei Liu
On Wed, 18 Dec 2019 at 20:24, Michael Kelley  wrote:
>
> From: Durrant, Paul  Sent: Wednesday, December 18, 2019 
> 7:24 AM
>
> > > From: Wei Liu  On Behalf Of Wei Liu
> > > Sent: 18 December 2019 14:43
>
> [snip]
>
> > > +
> > > +static inline uint64_t read_hyperv_timer(void)
> > > +{
> > > +uint64_t scale, offset, ret, tsc;
> > > +uint32_t seq;
> > > +const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc;
> > > +
> > > +do {
> > > +seq = tsc_page->tsc_sequence;
> > > +
> > > +/* Seq 0 is special. It means the TSC enlightenment is not
> > > + * available at the moment. The reference time can only be
> > > + * obtained from the Reference Counter MSR.
> > > + */
> > > +if ( seq == 0 )
> >
> > Older versions of the spec used to use 0x I think, although when I 
> > look again they
> > seem to have been retro-actively fixed. In any case I think you should 
> > treat both
> > 0x and 0 as invalid.
>
> FWIW, the 0x was just a bug in the spec.  Hyper-V implementations only
> set the value to 0 to indicate invalid.  The equivalent Linux code checks 
> only for 0.
>

Thanks for chiming in, Michael.

In that case I will submit a fix to change Xen's viridian code to
remove the wrong value there.

Wei.

> Michael

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool

2019-12-18 Thread Tamas K Lengyel
On Wed, Dec 18, 2019 at 2:29 PM Julien Grall  wrote:
>
> Hi Tamas,
>
> On 18/12/2019 19:40, Tamas K Lengyel wrote:
> > MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing.
> > However, the bitfield is not used for anything else, so just convert it to a
> > bool instead.
> >
> > Signed-off-by: Tamas K Lengyel 
> > ---
> >   xen/arch/x86/mm/mem_sharing.c | 7 +++
> >   xen/arch/x86/mm/p2m.c | 1 +
> >   xen/common/memory.c   | 2 +-
> >   xen/include/asm-x86/mem_sharing.h | 5 ++---
> >   4 files changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index fc1d8be1eb..6e81e1a895 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1175,7 +1175,7 @@ err_out:
> >*/
> >   int __mem_sharing_unshare_page(struct domain *d,
> >  unsigned long gfn,
> > -   uint16_t flags)
> > +   bool destroy)
> >   {
> >   p2m_type_t p2mt;
> >   mfn_t mfn;
> > @@ -1231,7 +1231,7 @@ int __mem_sharing_unshare_page(struct domain *d,
> >* If the GFN is getting destroyed drop the references to MFN
> >* (possibly freeing the page), and exit early.
> >*/
> > -if ( flags & MEM_SHARING_DESTROY_GFN )
> > +if ( destroy )
> >   {
> >   if ( !last_gfn )
> >   mem_sharing_gfn_destroy(page, d, gfn_info);
> > @@ -1321,8 +1321,7 @@ int relinquish_shared_pages(struct domain *d)
> >   if ( mfn_valid(mfn) && p2m_is_shared(t) )
> >   {
> >   /* Does not fail with ENOMEM given the DESTROY flag */
> > -BUG_ON(__mem_sharing_unshare_page(d, gfn,
> > -   MEM_SHARING_DESTROY_GFN));
> > +BUG_ON(__mem_sharing_unshare_page(d, gfn, true));
> >   /*
> >* Clear out the p2m entry so no one else may try to
> >* unshare.  Must succeed: we just read the old entry and
> > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> > index baea632acc..53ea44fe3c 100644
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -517,6 +517,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, 
> > unsigned long gfn_l,
> >*/
> >   if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 )
> >   mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
> > +
>
> This line looks spurious.

Yeap.

>
> >   mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
> >   }
> >
> > diff --git a/xen/common/memory.c b/xen/common/memory.c
> > index 309e872edf..c7d2bac452 100644
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long 
> > gmfn)
> >* might be the only one using this shared page, and we need to
> >* trigger proper cleanup. Once done, this is like any other page.
> >*/
> > -rc = mem_sharing_unshare_page(d, gmfn, 0);
> > +rc = mem_sharing_unshare_page(d, gmfn);
>
> AFAICT, this patch does not reduce the number of parameters for
> mem_sharing_unshare_page(). Did you intend to make this change in
> another patch?

Ah yea, it should have been dropped in patch 6 of the series.

>
> >   if ( rc )
> >   {
> >   mem_sharing_notify_enomem(d, gmfn, false);
> > diff --git a/xen/include/asm-x86/mem_sharing.h 
> > b/xen/include/asm-x86/mem_sharing.h
> > index 89cdaccea0..4b982a4803 100644
> > --- a/xen/include/asm-x86/mem_sharing.h
> > +++ b/xen/include/asm-x86/mem_sharing.h
> > @@ -76,17 +76,16 @@ struct page_sharing_info
> >   unsigned int mem_sharing_get_nr_saved_mfns(void);
> >   unsigned int mem_sharing_get_nr_shared_mfns(void);
> >
> > -#define MEM_SHARING_DESTROY_GFN   (1<<1)
> >   /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */
> >   int __mem_sharing_unshare_page(struct domain *d,
> >  unsigned long gfn,
> > -   uint16_t flags);
> > +   bool destroy);
> >
> >   static inline
> >   int mem_sharing_unshare_page(struct domain *d,
> >unsigned long gfn)
> >   {
> > -int rc = __mem_sharing_unshare_page(d, gfn, 0);
> > +int rc = __mem_sharing_unshare_page(d, gfn, false);
> >   BUG_ON(rc && (rc != -ENOMEM));
> >   return rc;
> >   }
> >
>
> Cheers,

Thanks,
Tamas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ARM Dom0less passthrough without IOMMU

2019-12-18 Thread Stefano Stabellini
On Wed, 18 Dec 2019, Julien Grall wrote:
> Hi Stefano,
> 
> On 17/12/2019 18:28, Stefano Stabellini wrote:
> > > Then I tried to passthrough the eMMC, but I got the following
> > > error:
> > > (XEN) DOM1: [0.879151] sdhci-esdhc-imx 4005d000.usdhc: can't request
> > > region for resource [mem 0x4005d000-0x4005dfff]
> > > (XEN) DOM1: [0.891137] sdhci-esdhc-imx 4005d000.usdhc:
> > > sdhci_pltfm_init failed -16
> > > (XEN) DOM1: [0.900249] sdhci-esdhc-imx: probe of 4005d000.usdhc failed
> > > with error -16
> > > 
> > > Where 0x4005d000 is the physical address of the uSDHC(eMMC) node in the
> > > DT.
> > > It seems that the DomU1 kernel does not have access to that memory zone.
> > 
> > It looks like drivers/mmc/host/sdhci-pltfm.c:sdhci_pltfm_init failed,
> > but I cannot see a simple reason why it would. As Julien mentioned the
> > device tree snippet would be useful. Also the domU config and the full
> > device tree would be useful. i.e. did you add "xen,passthrough;" under
> > the related uSDHC node on the host device tree?
> 
> The only purpose of "xen,passthrough" is to mark the device as disabled in
> Dom0 DT. It will not affect how device will be passthrough to a guest.
> 
> In this case, I don't believe the problem is DT related because Linux is able
> to find the regions. If the region were not mapped to the guest, then it would
> be likely result to a data abort later on.
> 
> Looking at Andrei's e-mail again, he doesn't mention anything about the 1:1
> mapping. So I assume, he is still using the guest memory layout. The physical
> address 0x4005d000 which is roughly 372KB into the first RAM bank for the
> guest.
> 
> > > I'm trying to passthrough the eMMC in order to mount DomU1's root
> > > on a SDCard partition, because I couldn't get to DomU1's Linux prompt
> > > when I tried to boot with a ramdisk module. I always get this error:
> > > (XEN) DOM1: [1.544199] RAMDISK: Couldn't find valid RAM disk image
> > > starting at 0.
> > > 
> > > Could this be because the ramdisk is too big? The smallest I've tried with
> > > Is approximately 60MB in size. What size are the ramdisks that you
> > > are using in your dom0less booting demos?
> > 
> > I don't think so, I could boot with ramdisk 120MB in size or even
> > larger. It is probably an address calculation error: it is easy to make
> > a small mistake in the addresses so that they end up overlapping.
> > Sometimes it is even U-Boot that causes the overlaps.
> > 
> > I would suggest to use ImageBuilder to create the U-Boot boot script to
> > load all the binaries and boot the system. Have a look at
> > uboot-script-gen in particular:
> > 
> > https://gitlab.com/ViryaOS/imagebuilder/blob/master/scripts/uboot-script-gen
> 
> Nice script, but it seems to contain hardcoded value (see offset and memaddr
> override), does not take into account reserved region and assume where
> U-boot/ATF may be loaded. So it may require some work before it can be used on
> NXP board...

Yes, you are right about that. The script doesn't understand
reserved-memory today and it will just start loading binaries at 2MB
after "MEMORY_START" as specified in the config file, assuming that it
is safe to do so.

Andrei, if you end up using it and it doesn't work, please let me know.
I am interested in understanding any failures and might be able to
improve the script or take patches for it.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM

2019-12-18 Thread Julien Grall

Hi Roman,

On 18/12/2019 17:03, Roman Shaposhnik wrote:

On Wed, Dec 18, 2019 at 3:50 AM Julien Grall  wrote:
So -- nothing boots directly by UEFI -- everything goes through GRUB.

However, my understanding is that GRUB will detect devicetree
information provided by UEFI (even though devicetree command is
supposed to completely replace that). Hence it is possible that Linux
relies on some residuals left in memory by GRUB that Xen doesn't pay
attention to (but this is a pretty wild speculation only).


While it goes through GRUB, it is a bootloader and will just act as a 
proxy for EFI. So EFI application such as Xen/Linux can still be loaded 
and take advantage of runtime servies if present/implemented.


In fact most of people on Arm are using GRUB rather than EFI directly as 
this is more friendly to use.


Regarding the devicetree, Xen and Linux will completely ignore the 
memory nodes in Xen if using EFI. This because the EFI memory map will 
give you an overview of the platform with the EFI regions included.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork

2019-12-18 Thread Julien Grall

Hi Tamas,

On 18/12/2019 19:40, Tamas K Lengyel wrote:

Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel 
---
  xen/arch/x86/mm/mem_sharing.c | 105 ++
  xen/include/public/memory.h   |   1 +
  2 files changed, 106 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index e93ad2ec5a..4735a334b9 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct 
domain *cd)
  return 0;
  }
  
+struct gfn_free;

+struct gfn_free {
+struct gfn_free *next;
+struct page_info *page;
+gfn_t gfn;
+};
+
+static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
+{
+int rc;
+
+struct p2m_domain* p2m = p2m_get_hostp2m(cd);
+struct gfn_free *list = NULL;
+struct page_info *page;
+
+page_list_for_each(page, >page_list)


AFAICT, your domain is not paused, so it would be possible to have page 
added/remove in that list behind your back.


You also have multiple loop on the page_list in this function. Given the 
number of page_list can be quite big, this is a call for hogging the 
pCPU and an RCU lock on the domain vCPU running this call.



+{
+mfn_t mfn = page_to_mfn(page);
+if ( mfn_valid(mfn) )
+{
+p2m_type_t p2mt;
+p2m_access_t p2ma;
+gfn_t gfn = mfn_to_gfn(cd, mfn);
+mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , ,
+0, NULL, false);
+if ( p2m_is_ram(p2mt) )
+{
+struct gfn_free *gfn_free;
+if ( !get_page(page, cd) )
+goto err_reset;
+
+/*
+ * We can't free the page while iterating over the page_list
+ * so we build a separate list to loop over.
+ *
+ * We want to iterate over the page_list instead of checking
+ * gfn from 0 to max_gfn because this is ~10x faster.
+ */
+gfn_free = xmalloc(struct gfn_free);


If I did the math right, for a 4G guest this will require at ~24MB of 
memory. Actually, is it really necessary to do the allocation for a 
short period of time?


What are you trying to achieve by iterating twice on the GFN? Wouldn't 
it be easier to pause the domain?



+if ( !gfn_free )
+goto err_reset;
+
+gfn_free->gfn = gfn;
+gfn_free->page = page;
+gfn_free->next = list;
+list = gfn_free;
+}
+}
+}
+
+while ( list )
+{
+struct gfn_free *next = list->next;
+
+rc = p2m->set_entry(p2m, list->gfn, INVALID_MFN, PAGE_ORDER_4K,
+p2m_invalid, p2m_access_rwx, -1);
+put_page_alloc_ref(list->page);
+put_page(list->page);
+
+xfree(list);
+list = next;
+
+ASSERT(!rc);
+}
+
+if ( (rc = fork_hvm(d, cd)) )
+return rc;
+
+ err_reset:
+while ( list )
+{
+struct gfn_free *next = list->next;
+
+put_page(list->page);
+xfree(list);
+list = next;
+}
+
+return 0;
+}
+


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool

2019-12-18 Thread Julien Grall

Hi Tamas,

On 18/12/2019 19:40, Tamas K Lengyel wrote:

MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing.
However, the bitfield is not used for anything else, so just convert it to a
bool instead.

Signed-off-by: Tamas K Lengyel 
---
  xen/arch/x86/mm/mem_sharing.c | 7 +++
  xen/arch/x86/mm/p2m.c | 1 +
  xen/common/memory.c   | 2 +-
  xen/include/asm-x86/mem_sharing.h | 5 ++---
  4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index fc1d8be1eb..6e81e1a895 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1175,7 +1175,7 @@ err_out:
   */
  int __mem_sharing_unshare_page(struct domain *d,
 unsigned long gfn,
-   uint16_t flags)
+   bool destroy)
  {
  p2m_type_t p2mt;
  mfn_t mfn;
@@ -1231,7 +1231,7 @@ int __mem_sharing_unshare_page(struct domain *d,
   * If the GFN is getting destroyed drop the references to MFN
   * (possibly freeing the page), and exit early.
   */
-if ( flags & MEM_SHARING_DESTROY_GFN )
+if ( destroy )
  {
  if ( !last_gfn )
  mem_sharing_gfn_destroy(page, d, gfn_info);
@@ -1321,8 +1321,7 @@ int relinquish_shared_pages(struct domain *d)
  if ( mfn_valid(mfn) && p2m_is_shared(t) )
  {
  /* Does not fail with ENOMEM given the DESTROY flag */
-BUG_ON(__mem_sharing_unshare_page(d, gfn,
-   MEM_SHARING_DESTROY_GFN));
+BUG_ON(__mem_sharing_unshare_page(d, gfn, true));
  /*
   * Clear out the p2m entry so no one else may try to
   * unshare.  Must succeed: we just read the old entry and
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index baea632acc..53ea44fe3c 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -517,6 +517,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, 
unsigned long gfn_l,
   */
  if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 )
  mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
+


This line looks spurious.


  mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
  }
  
diff --git a/xen/common/memory.c b/xen/common/memory.c

index 309e872edf..c7d2bac452 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
   * might be the only one using this shared page, and we need to
   * trigger proper cleanup. Once done, this is like any other page.
   */
-rc = mem_sharing_unshare_page(d, gmfn, 0);
+rc = mem_sharing_unshare_page(d, gmfn);


AFAICT, this patch does not reduce the number of parameters for 
mem_sharing_unshare_page(). Did you intend to make this change in 
another patch?



  if ( rc )
  {
  mem_sharing_notify_enomem(d, gmfn, false);
diff --git a/xen/include/asm-x86/mem_sharing.h 
b/xen/include/asm-x86/mem_sharing.h
index 89cdaccea0..4b982a4803 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -76,17 +76,16 @@ struct page_sharing_info
  unsigned int mem_sharing_get_nr_saved_mfns(void);
  unsigned int mem_sharing_get_nr_shared_mfns(void);
  
-#define MEM_SHARING_DESTROY_GFN   (1<<1)

  /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */
  int __mem_sharing_unshare_page(struct domain *d,
 unsigned long gfn,
-   uint16_t flags);
+   bool destroy);
  
  static inline

  int mem_sharing_unshare_page(struct domain *d,
   unsigned long gfn)
  {
-int rc = __mem_sharing_unshare_page(d, gfn, 0);
+int rc = __mem_sharing_unshare_page(d, gfn, false);
  BUG_ON(rc && (rc != -ENOMEM));
  return rc;
  }



Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/6] arm/arm64/xen: hypercall.h add includes guards

2019-12-18 Thread Pavel Tatashin
> >   /*
> > -  * Whenever we re-enter userspace, the domains should always be
> > +  * Whenever we re-enter kernel, the domains should always be
>
> This feels unrelated from the rest of the patch and probably want an
> explanation. So I think this want to be in a separate patch.

I will simply remove this comment fix, since I do not change anything
else in this file anymore.

> The rest of the patch looks good to me.

Thank you Julien.

>
> Cheers,
>
> --
> Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/6] arm/arm64/xen: use C inlines for privcmd_call

2019-12-18 Thread Pavel Tatashin
On Mon, Dec 16, 2019 at 3:41 PM Julien Grall  wrote:
>
> Hello,
>
> On 04/12/2019 23:20, Pavel Tatashin wrote:
> > privcmd_call requires to enable access to userspace for the
> > duration of the hypercall.
> >
> > Currently, this is done via assembly macros. Change it to C
> > inlines instead.
> >
> > Signed-off-by: Pavel Tatashin 
> > Acked-by: Stefano Stabellini 
>
> Reviewed-by: Julien Grall 

Great, thank you!

Pasha

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-4.13-testing test] 144932: tolerable FAIL - PUSHED

2019-12-18 Thread osstest service owner
flight 144932 xen-4.13-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144932/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 12 guest-start  fail REGR. vs. 144774

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass

version targeted for testing:
 xen  a2e84d8e42c9e878fff17b738d8e5c5d83888f31
baseline version:
 xen  ddccd9f87ef8accdff518dc2ebb64c05f55cd278

Last test of basis   144774  2019-12-12 22:39:31 Z5 days
Testing same since   144932  2019-12-18 12:06:15 Z0 days1 attempts


People who touched revisions under test:
  Ian Jackson 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 

Re: [Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source

2019-12-18 Thread Michael Kelley
From: Durrant, Paul  Sent: Wednesday, December 18, 2019 
7:24 AM

> > From: Wei Liu  On Behalf Of Wei Liu
> > Sent: 18 December 2019 14:43

[snip]

> > +
> > +static inline uint64_t read_hyperv_timer(void)
> > +{
> > +uint64_t scale, offset, ret, tsc;
> > +uint32_t seq;
> > +const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc;
> > +
> > +do {
> > +seq = tsc_page->tsc_sequence;
> > +
> > +/* Seq 0 is special. It means the TSC enlightenment is not
> > + * available at the moment. The reference time can only be
> > + * obtained from the Reference Counter MSR.
> > + */
> > +if ( seq == 0 )
> 
> Older versions of the spec used to use 0x I think, although when I 
> look again they
> seem to have been retro-actively fixed. In any case I think you should treat 
> both
> 0x and 0 as invalid.

FWIW, the 0x was just a bug in the spec.  Hyper-V implementations only
set the value to 0 to indicate invalid.  The equivalent Linux code checks only 
for 0.

Michael

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-18 Thread Andrew Cooper
On 18/12/2019 16:09, Paul Durrant wrote:
> ...for patches not (yet) upstream.
>
> This patch is simply reserving save record number space to avoid the
> risk of clashes between existent downstream changes made by Amazon and
> future upstream changes which may be incompatible.
>
> Signed-off-by: Paul Durrant 

Is this "you've already used some of these", or you plan to?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 14/20] x86/mem_sharing: Enable mem_sharing on first memop

2019-12-18 Thread Tamas K Lengyel
It is wasteful to require separate hypercalls to enable sharing on both the
parent and the client domain during VM forking. To speed things up we enable
sharing on the first memop in case it wasn't already enabled.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 39 +--
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index e5c1424f9b..48809a5349 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1402,6 +1402,24 @@ static int range_share(struct domain *d, struct domain 
*cd,
 return rc;
 }
 
+static inline int mem_sharing_control(struct domain *d, bool enable)
+{
+if ( enable )
+{
+if ( unlikely(!is_hvm_domain(d)) )
+return -ENOSYS;
+
+if ( unlikely(!hap_enabled(d)) )
+return -ENODEV;
+
+if ( unlikely(is_iommu_enabled(d)) )
+return -EXDEV;
+}
+
+d->arch.hvm.mem_sharing.enabled = enable;
+return 0;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
 int rc;
@@ -1423,10 +1441,8 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 if ( rc )
 goto out;
 
-/* Only HAP is supported */
-rc = -ENODEV;
-if ( !mem_sharing_enabled(d) )
-goto out;
+if ( !mem_sharing_enabled(d) && (rc = mem_sharing_control(d, true)) )
+return rc;
 
 switch ( mso.op )
 {
@@ -1675,24 +1691,15 @@ int mem_sharing_domctl(struct domain *d, struct 
xen_domctl_mem_sharing_op *mec)
 {
 int rc;
 
-/* Only HAP is supported */
-if ( !hap_enabled(d) )
- return -ENODEV;
-
 switch(mec->op)
 {
 case XEN_DOMCTL_MEM_SHARING_CONTROL:
-{
-rc = 0;
-if ( unlikely(is_iommu_enabled(d) && mec->u.enable) )
-rc = -EXDEV;
-else
-d->arch.hvm.mem_sharing.enabled = mec->u.enable;
-}
-break;
+rc = mem_sharing_control(d, mec->u.enable);
+break;
 
 default:
 rc = -ENOSYS;
+break;
 }
 
 return rc;
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 15/20] x86/mem_sharing: Skip xen heap pages in memshr nominate

2019-12-18 Thread Tamas K Lengyel
Trying to share these would fail anyway, better to skip them early.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 48809a5349..b3607b1bce 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -852,6 +852,11 @@ static int nominate_page(struct domain *d, gfn_t gfn,
 if ( !p2m_is_sharable(p2mt) )
 goto out;
 
+/* Skip xen heap pages */
+page = mfn_to_page(mfn);
+if ( !page || is_xen_heap_page(page) )
+goto out;
+
 /* Check if there are mem_access/remapped altp2m entries for this page */
 if ( altp2m_active(d) )
 {
@@ -882,7 +887,6 @@ static int nominate_page(struct domain *d, gfn_t gfn,
 }
 
 /* Try to convert the mfn to the sharable type */
-page = mfn_to_page(mfn);
 ret = page_make_sharable(d, page, expected_refcnt);
 if ( ret )
 goto out;
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 20/20] xen/tools: VM forking toolstack side

2019-12-18 Thread Tamas K Lengyel
Add necessary bits to implement "xl fork-vm", "xl fork-launch-dm" and
"xl fork-reset" commands. The process is split in two to allow tools needing
access to the new VM as fast as possible after it was forked. It is expected
that under certain use-cases the second command that launches QEMU will be
skipped entirely.

Signed-off-by: Tamas K Lengyel 
---
 tools/libxc/include/xenctrl.h |   6 +
 tools/libxc/xc_memshr.c   |  22 
 tools/libxl/libxl.h   |   7 +
 tools/libxl/libxl_create.c| 237 +++---
 tools/libxl/libxl_dm.c|   2 +-
 tools/libxl/libxl_dom.c   |  83 
 tools/libxl/libxl_internal.h  |   1 +
 tools/libxl/libxl_types.idl   |   1 +
 tools/xl/xl.h |   5 +
 tools/xl/xl_cmdtable.c|  22 
 tools/xl/xl_saverestore.c |  96 ++
 tools/xl/xl_vmcontrol.c   |   8 ++
 12 files changed, 386 insertions(+), 104 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index b5ffa53d55..39afdb9b33 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2221,6 +2221,12 @@ int xc_memshr_range_share(xc_interface *xch,
   uint64_t first_gfn,
   uint64_t last_gfn);
 
+int xc_memshr_fork(xc_interface *xch,
+   uint32_t source_domain,
+   uint32_t client_domain);
+
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater.
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index 5ef56a6933..ef5a5ee6a4 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -237,6 +237,28 @@ int xc_memshr_debug_gref(xc_interface *xch,
 return xc_memshr_memop(xch, domid, );
 }
 
+int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid)
+{
+xen_mem_sharing_op_t mso;
+
+memset(, 0, sizeof(mso));
+
+mso.op = XENMEM_sharing_op_fork;
+mso.u.fork.parent_domain = pdomid;
+
+return xc_memshr_memop(xch, domid, );
+}
+
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid)
+{
+xen_mem_sharing_op_t mso;
+
+memset(, 0, sizeof(mso));
+mso.op = XENMEM_sharing_op_fork_reset;
+
+return xc_memshr_memop(xch, domid, );
+}
+
 int xc_memshr_audit(xc_interface *xch)
 {
 xen_mem_sharing_op_t mso;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 54abb9db1f..75cb070587 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1536,6 +1536,13 @@ int libxl_domain_create_new(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
 LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
+ LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config,
+uint32_t domid,
+const libxl_asyncprogress_how *aop_console_how)
+LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid);
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
 uint32_t *domid, int restore_fd,
 int send_back_fd,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 32d45dcef0..e0d219596c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -536,12 +536,12 @@ out:
 return ret;
 }
 
-int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
-   libxl__domain_build_state *state,
-   uint32_t *domid)
+static int libxl__domain_make_xs_entries(libxl__gc *gc, libxl_domain_config 
*d_config,
+ libxl__domain_build_state *state,
+ uint32_t domid)
 {
 libxl_ctx *ctx = libxl__gc_owner(gc);
-int ret, rc, nb_vm;
+int rc, nb_vm;
 const char *dom_type;
 char *uuid_string;
 char *dom_path, *vm_path, *libxl_path;
@@ -553,7 +553,6 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config 
*d_config,
 
 /* convenience aliases */
 libxl_domain_create_info *info = _config->c_info;
-libxl_domain_build_info *b_info = _config->b_info;
 
 uuid_string = libxl__uuid2string(gc, info->uuid);
 if (!uuid_string) {
@@ -561,64 +560,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config 
*d_config,
 goto out;
 }
 
-/* Valid domid here means we're soft resetting. */
-if (!libxl_domid_valid_guest(*domid)) {
-struct xen_domctl_createdomain create = {
-.ssidref = info->ssidref,
- 

[Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork

2019-12-18 Thread Tamas K Lengyel
Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 105 ++
 xen/include/public/memory.h   |   1 +
 2 files changed, 106 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index e93ad2ec5a..4735a334b9 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct 
domain *cd)
 return 0;
 }
 
+struct gfn_free;
+struct gfn_free {
+struct gfn_free *next;
+struct page_info *page;
+gfn_t gfn;
+};
+
+static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
+{
+int rc;
+
+struct p2m_domain* p2m = p2m_get_hostp2m(cd);
+struct gfn_free *list = NULL;
+struct page_info *page;
+
+page_list_for_each(page, >page_list)
+{
+mfn_t mfn = page_to_mfn(page);
+if ( mfn_valid(mfn) )
+{
+p2m_type_t p2mt;
+p2m_access_t p2ma;
+gfn_t gfn = mfn_to_gfn(cd, mfn);
+mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , ,
+0, NULL, false);
+if ( p2m_is_ram(p2mt) )
+{
+struct gfn_free *gfn_free;
+if ( !get_page(page, cd) )
+goto err_reset;
+
+/*
+ * We can't free the page while iterating over the page_list
+ * so we build a separate list to loop over.
+ *
+ * We want to iterate over the page_list instead of checking
+ * gfn from 0 to max_gfn because this is ~10x faster.
+ */
+gfn_free = xmalloc(struct gfn_free);
+if ( !gfn_free )
+goto err_reset;
+
+gfn_free->gfn = gfn;
+gfn_free->page = page;
+gfn_free->next = list;
+list = gfn_free;
+}
+}
+}
+
+while ( list )
+{
+struct gfn_free *next = list->next;
+
+rc = p2m->set_entry(p2m, list->gfn, INVALID_MFN, PAGE_ORDER_4K,
+p2m_invalid, p2m_access_rwx, -1);
+put_page_alloc_ref(list->page);
+put_page(list->page);
+
+xfree(list);
+list = next;
+
+ASSERT(!rc);
+}
+
+if ( (rc = fork_hvm(d, cd)) )
+return rc;
+
+ err_reset:
+while ( list )
+{
+struct gfn_free *next = list->next;
+
+put_page(list->page);
+xfree(list);
+list = next;
+}
+
+return 0;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
 int rc;
@@ -1905,6 +1986,30 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 rcu_unlock_domain(pd);
 break;
 }
+
+case XENMEM_sharing_op_fork_reset:
+{
+struct domain *pd;
+
+rc = -EINVAL;
+if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
+ mso.u.fork._pad[2] )
+ goto out;
+
+rc = -ENOSYS;
+if ( !d->parent )
+goto out;
+
+rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, );
+if ( rc )
+goto out;
+
+rc = mem_sharing_fork_reset(pd, d);
+
+rcu_unlock_domain(pd);
+break;
+}
+
 default:
 rc = -ENOSYS;
 break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 90a3f4498e..e3d063e22e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_audit 7
 #define XENMEM_sharing_op_range_share   8
 #define XENMEM_sharing_op_fork  9
+#define XENMEM_sharing_op_fork_reset10
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 18/20] xen/mem_access: Use __get_gfn_type_access in set_mem_access

2019-12-18 Thread Tamas K Lengyel
Use __get_gfn_type_access instead of p2m->get_entry to trigger page-forking
when the mem_access permission is being set on a page that has not yet been
copied over from the parent.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_access.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
index 320b9fe621..9caf08a5b2 100644
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -303,11 +303,10 @@ static int set_mem_access(struct domain *d, struct 
p2m_domain *p2m,
 ASSERT(!ap2m);
 #endif
 {
-mfn_t mfn;
 p2m_access_t _a;
 p2m_type_t t;
-
-mfn = p2m->get_entry(p2m, gfn, , &_a, 0, NULL, NULL);
+mfn_t mfn = __get_gfn_type_access(p2m, gfn_x(gfn), , &_a,
+  P2M_ALLOC, NULL, false);
 rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, t, a, -1);
 }
 
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 13/20] x86/mem_sharing: ASSERT that p2m_set_entry succeeds

2019-12-18 Thread Tamas K Lengyel
Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 46 +--
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 90b6371e2f..e5c1424f9b 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1113,39 +1113,37 @@ int add_to_physmap(struct domain *sd, unsigned long 
sgfn, shr_handle_t sh,
 goto err_unlock;
 }
 
+/*
+ * Must succeed, we just read the entry and hold the p2m lock
+ * via get_two_gfns.
+ */
 ret = p2m_set_entry(p2m, _gfn(cgfn), smfn, PAGE_ORDER_4K,
 p2m_ram_shared, a);
+ASSERT(!ret);
 
-/* Tempted to turn this into an assert */
-if ( ret )
+/*
+ * There is a chance we're plugging a hole where a paged out
+ * page was.
+ */
+if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) )
 {
-mem_sharing_gfn_destroy(spage, cd, gfn_info);
-put_page_and_type(spage);
-} else {
+atomic_dec(>paged_pages);
 /*
- * There is a chance we're plugging a hole where a paged out
- * page was.
+ * Further, there is a chance this was a valid page.
+ * Don't leak it.
  */
-if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) )
+if ( mfn_valid(cmfn) )
 {
-atomic_dec(>paged_pages);
-/*
- * Further, there is a chance this was a valid page.
- * Don't leak it.
- */
-if ( mfn_valid(cmfn) )
-{
-struct page_info *cpage = mfn_to_page(cmfn);
+struct page_info *cpage = mfn_to_page(cmfn);
 
-if ( !get_page(cpage, cd) )
-{
-domain_crash(cd);
-ret = -EOVERFLOW;
-goto err_unlock;
-}
-put_page_alloc_ref(cpage);
-put_page(cpage);
+if ( !get_page(cpage, cd) )
+{
+domain_crash(cd);
+ret = -EOVERFLOW;
+goto err_unlock;
 }
+put_page_alloc_ref(cpage);
+put_page(cpage);
 }
 }
 
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 09/20] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages

2019-12-18 Thread Tamas K Lengyel
While using _mfn(0) is of no consequence during teardown, INVALID_MFN is the
correct value that should be used.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 5d81730315..1b7b520ccf 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1317,7 +1317,7 @@ int relinquish_shared_pages(struct domain *d)
 break;
 
 mfn = p2m->get_entry(p2m, _gfn(gfn), , , 0, NULL, NULL);
-if ( mfn_valid(mfn) && t == p2m_ram_shared )
+if ( mfn_valid(mfn) && p2m_is_shared(t) )
 {
 /* Does not fail with ENOMEM given the DESTROY flag */
 BUG_ON(__mem_sharing_unshare_page(d, gfn,
@@ -1327,7 +1327,7 @@ int relinquish_shared_pages(struct domain *d)
  * unshare.  Must succeed: we just read the old entry and
  * we hold the p2m lock.
  */
-set_rc = p2m->set_entry(p2m, _gfn(gfn), _mfn(0), PAGE_ORDER_4K,
+set_rc = p2m->set_entry(p2m, _gfn(gfn), INVALID_MFN, PAGE_ORDER_4K,
 p2m_invalid, p2m_access_rwx, -1);
 ASSERT(!set_rc);
 count += 0x10;
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 07/20] x86/mem_sharing: don't try to unshare twice during page fault

2019-12-18 Thread Tamas K Lengyel
The page was already tried to be unshared in get_gfn_type_access. If that
didn't work, then trying again is pointless. Don't try to send vm_event again
either, simply check if there is a ring or not.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/hvm/hvm.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e055114922..8f90841813 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1706,11 +1707,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
long gla,
 struct domain *currd = curr->domain;
 struct p2m_domain *p2m, *hostp2m;
 int rc, fall_through = 0, paged = 0;
-int sharing_enomem = 0;
 vm_event_request_t *req_ptr = NULL;
 bool sync = false;
 unsigned int page_order;
 
+#ifdef CONFIG_MEM_SHARING
+bool sharing_enomem = false;
+#endif
+
 /* On Nested Virtualization, walk the guest page table.
  * If this succeeds, all is fine.
  * If this fails, inject a nested page fault into the guest.
@@ -1898,14 +1902,16 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
long gla,
 if ( p2m_is_paged(p2mt) || (p2mt == p2m_ram_paging_out) )
 paged = 1;
 
-/* Mem sharing: unshare the page and try again */
-if ( npfec.write_access && (p2mt == p2m_ram_shared) )
+#ifdef CONFIG_MEM_SHARING
+/* Mem sharing: if still shared on write access then its enomem */
+if ( npfec.write_access && p2m_is_shared(p2mt) )
 {
 ASSERT(p2m_is_hostp2m(p2m));
-sharing_enomem = mem_sharing_unshare_page(currd, gfn);
+sharing_enomem = true;
 rc = 1;
 goto out_put_gfn;
 }
+#endif
 
 /* Spurious fault? PoD and log-dirty also take this path. */
 if ( p2m_is_ram(p2mt) )
@@ -1959,19 +1965,21 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
long gla,
  */
 if ( paged )
 p2m_mem_paging_populate(currd, gfn);
+
+#ifdef CONFIG_MEM_SHARING
 if ( sharing_enomem )
 {
-int rv;
-
-if ( (rv = mem_sharing_notify_enomem(currd, gfn, true)) < 0 )
+if ( !vm_event_check_ring(currd->vm_event_share) )
 {
 gdprintk(XENLOG_ERR, "Domain %hu attempt to unshare "
- "gfn %lx, ENOMEM and no helper (rc %d)\n",
- currd->domain_id, gfn, rv);
+ "gfn %lx, ENOMEM and no helper\n",
+ currd->domain_id, gfn);
 /* Crash the domain */
 rc = 0;
 }
 }
+#endif
+
 if ( req_ptr )
 {
 if ( monitor_traps(curr, sync, req_ptr) < 0 )
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool

2019-12-18 Thread Tamas K Lengyel
MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing.
However, the bitfield is not used for anything else, so just convert it to a
bool instead.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 7 +++
 xen/arch/x86/mm/p2m.c | 1 +
 xen/common/memory.c   | 2 +-
 xen/include/asm-x86/mem_sharing.h | 5 ++---
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index fc1d8be1eb..6e81e1a895 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1175,7 +1175,7 @@ err_out:
  */
 int __mem_sharing_unshare_page(struct domain *d,
unsigned long gfn,
-   uint16_t flags)
+   bool destroy)
 {
 p2m_type_t p2mt;
 mfn_t mfn;
@@ -1231,7 +1231,7 @@ int __mem_sharing_unshare_page(struct domain *d,
  * If the GFN is getting destroyed drop the references to MFN
  * (possibly freeing the page), and exit early.
  */
-if ( flags & MEM_SHARING_DESTROY_GFN )
+if ( destroy )
 {
 if ( !last_gfn )
 mem_sharing_gfn_destroy(page, d, gfn_info);
@@ -1321,8 +1321,7 @@ int relinquish_shared_pages(struct domain *d)
 if ( mfn_valid(mfn) && p2m_is_shared(t) )
 {
 /* Does not fail with ENOMEM given the DESTROY flag */
-BUG_ON(__mem_sharing_unshare_page(d, gfn,
-   MEM_SHARING_DESTROY_GFN));
+BUG_ON(__mem_sharing_unshare_page(d, gfn, true));
 /*
  * Clear out the p2m entry so no one else may try to
  * unshare.  Must succeed: we just read the old entry and
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index baea632acc..53ea44fe3c 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -517,6 +517,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, 
unsigned long gfn_l,
  */
 if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 )
 mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
+
 mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 }
 
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 309e872edf..c7d2bac452 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
  * might be the only one using this shared page, and we need to
  * trigger proper cleanup. Once done, this is like any other page.
  */
-rc = mem_sharing_unshare_page(d, gmfn, 0);
+rc = mem_sharing_unshare_page(d, gmfn);
 if ( rc )
 {
 mem_sharing_notify_enomem(d, gmfn, false);
diff --git a/xen/include/asm-x86/mem_sharing.h 
b/xen/include/asm-x86/mem_sharing.h
index 89cdaccea0..4b982a4803 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -76,17 +76,16 @@ struct page_sharing_info
 unsigned int mem_sharing_get_nr_saved_mfns(void);
 unsigned int mem_sharing_get_nr_shared_mfns(void);
 
-#define MEM_SHARING_DESTROY_GFN   (1<<1)
 /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */
 int __mem_sharing_unshare_page(struct domain *d,
unsigned long gfn,
-   uint16_t flags);
+   bool destroy);
 
 static inline
 int mem_sharing_unshare_page(struct domain *d,
  unsigned long gfn)
 {
-int rc = __mem_sharing_unshare_page(d, gfn, 0);
+int rc = __mem_sharing_unshare_page(d, gfn, false);
 BUG_ON(rc && (rc != -ENOMEM));
 return rc;
 }
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible

2019-12-18 Thread Tamas K Lengyel
During VM forking we'll copy the parent domain's parameters to the client,
including the HAP shadow memory setting that is used for storing the domain's
EPT. We'll copy this in the hypervisor instead doing it during toolstack launch
to allow the domain to start executing and unsharing memory before (or
even completely without) the toolstack.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/hap/hap.c | 3 +--
 xen/include/asm-x86/hap.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3d93f3451c..c7c7ff6e99 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct 
page_info *pg)
 }
 
 /* Return the size of the pool, rounded up to the nearest MB */
-static unsigned int
-hap_get_allocation(struct domain *d)
+unsigned int hap_get_allocation(struct domain *d)
 {
 unsigned int pg = d->arch.paging.hap.total_pages
 + d->arch.paging.hap.p2m_pages;
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index b94bfb4ed0..1bf07e49fe 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -45,6 +45,7 @@ int   hap_track_dirty_vram(struct domain *d,
 
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
 int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted);
+unsigned int hap_get_allocation(struct domain *d);
 
 #endif /* XEN_HAP_H */
 
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 05/20] x86/mem_sharing: make get_two_gfns take locks conditionally

2019-12-18 Thread Tamas K Lengyel
During VM forking the client lock will already be taken.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 11 ++-
 xen/include/asm-x86/p2m.h | 10 +-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 319aaf3074..c0e305ad71 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -954,7 +954,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, 
shr_handle_t sh,
 unsigned long put_count = 0;
 
 get_two_gfns(sd, sgfn, _type, NULL, ,
- cd, cgfn, _type, NULL, , 0, );
+ cd, cgfn, _type, NULL, , 0, , true);
 
 /*
  * This tricky business is to avoid two callers deadlocking if
@@ -1068,7 +1068,7 @@ err_out:
 }
 
 int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, 
shr_handle_t sh,
-   struct domain *cd, unsigned long cgfn)
+   struct domain *cd, unsigned long cgfn, bool 
lock)
 {
 struct page_info *spage;
 int ret = -EINVAL;
@@ -1080,7 +1080,7 @@ int mem_sharing_add_to_physmap(struct domain *sd, 
unsigned long sgfn, shr_handle
 struct two_gfns tg;
 
 get_two_gfns(sd, _gfn(sgfn), _type, NULL, ,
- cd, _gfn(cgfn), _type, , , 0, );
+ cd, _gfn(cgfn), _type, , , 0, , lock);
 
 /* Get the source shared page, check and lock */
 ret = XENMEM_SHARING_OP_S_HANDLE_INVALID;
@@ -1155,7 +1155,8 @@ int mem_sharing_add_to_physmap(struct domain *sd, 
unsigned long sgfn, shr_handle
 err_unlock:
 mem_sharing_page_unlock(spage);
 err_out:
-put_two_gfns();
+if ( lock )
+put_two_gfns();
 return ret;
 }
 
@@ -1574,7 +1575,7 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 sh  = mso.u.share.source_handle;
 cgfn= mso.u.share.client_gfn;
 
-rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn);
+rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn, true);
 
 rcu_unlock_domain(cd);
 }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 94285db1b4..7399c4a897 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -539,7 +539,7 @@ struct two_gfns {
 static inline void get_two_gfns(struct domain *rd, gfn_t rgfn,
 p2m_type_t *rt, p2m_access_t *ra, mfn_t *rmfn, struct domain *ld,
 gfn_t lgfn, p2m_type_t *lt, p2m_access_t *la, mfn_t *lmfn,
-p2m_query_t q, struct two_gfns *rval)
+p2m_query_t q, struct two_gfns *rval, bool lock)
 {
 mfn_t   *first_mfn, *second_mfn, scratch_mfn;
 p2m_access_t*first_a, *second_a, scratch_a;
@@ -569,10 +569,10 @@ do {\
 #undef assign_pointers
 
 /* Now do the gets */
-*first_mfn  = get_gfn_type_access(p2m_get_hostp2m(rval->first_domain),
-  gfn_x(rval->first_gfn), first_t, 
first_a, q, NULL);
-*second_mfn = get_gfn_type_access(p2m_get_hostp2m(rval->second_domain),
-  gfn_x(rval->second_gfn), second_t, 
second_a, q, NULL);
+*first_mfn  = __get_gfn_type_access(p2m_get_hostp2m(rval->first_domain),
+gfn_x(rval->first_gfn), first_t, 
first_a, q, NULL, lock);
+*second_mfn = __get_gfn_type_access(p2m_get_hostp2m(rval->second_domain),
+gfn_x(rval->second_gfn), second_t, 
second_a, q, NULL, lock);
 }
 
 static inline void put_two_gfns(struct two_gfns *arg)
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 12/20] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk

2019-12-18 Thread Tamas K Lengyel
Using XENLOG_ERR level since this is only used in debug paths (ie. it's
expected the user already has loglvl=all set).

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 81 ++-
 1 file changed, 41 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 6e81e1a895..90b6371e2f 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -49,9 +49,6 @@ typedef struct pg_lock_data {
 
 static DEFINE_PER_CPU(pg_lock_data_t, __pld);
 
-#define MEM_SHARING_DEBUG(_f, _a...)  \
-debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a)
-
 /* Reverse map defines */
 #define RMAP_HASHTAB_ORDER  0
 #define RMAP_HASHTAB_SIZE   \
@@ -491,8 +488,9 @@ static int audit(void)
 /* If we can't lock it, it's definitely not a shared page */
 if ( !mem_sharing_page_lock(pg) )
 {
-   MEM_SHARING_DEBUG("mfn %lx in audit list, but cannot be locked 
(%lx)!\n",
-  mfn_x(mfn), pg->u.inuse.type_info);
+   gdprintk(XENLOG_ERR,
+"mfn %lx in audit list, but cannot be locked (%lx)!\n",
+mfn_x(mfn), pg->u.inuse.type_info);
errors++;
continue;
 }
@@ -500,8 +498,9 @@ static int audit(void)
 /* Check if the MFN has correct type, owner and handle. */
 if ( (pg->u.inuse.type_info & PGT_type_mask) != PGT_shared_page )
 {
-   MEM_SHARING_DEBUG("mfn %lx in audit list, but not PGT_shared_page 
(%lx)!\n",
-  mfn_x(mfn), pg->u.inuse.type_info & 
PGT_type_mask);
+   gdprintk(XENLOG_ERR,
+"mfn %lx in audit list, but not PGT_shared_page (%lx)!\n",
+mfn_x(mfn), pg->u.inuse.type_info & PGT_type_mask);
errors++;
continue;
 }
@@ -509,24 +508,24 @@ static int audit(void)
 /* Check the page owner. */
 if ( page_get_owner(pg) != dom_cow )
 {
-   MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner (%hu)!\n",
- mfn_x(mfn), page_get_owner(pg)->domain_id);
+   gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong owner (%hu)!\n",
+mfn_x(mfn), page_get_owner(pg)->domain_id);
errors++;
 }
 
 /* Check the m2p entry */
 if ( !SHARED_M2P(get_gpfn_from_mfn(mfn_x(mfn))) )
 {
-   MEM_SHARING_DEBUG("mfn %lx shared, but wrong m2p entry (%lx)!\n",
- mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn)));
+   gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong m2p entry (%lx)!\n",
+mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn)));
errors++;
 }
 
 /* Check we have a list */
 if ( (!pg->sharing) || !rmap_has_entries(pg) )
 {
-   MEM_SHARING_DEBUG("mfn %lx shared, but empty gfn list!\n",
- mfn_x(mfn));
+   gdprintk(XENLOG_ERR, "mfn %lx shared, but empty gfn list!\n",
+mfn_x(mfn));
errors++;
continue;
 }
@@ -545,24 +544,26 @@ static int audit(void)
 d = get_domain_by_id(g->domain);
 if ( d == NULL )
 {
-MEM_SHARING_DEBUG("Unknown dom: %hu, for PFN=%lx, MFN=%lx\n",
-  g->domain, g->gfn, mfn_x(mfn));
+gdprintk(XENLOG_ERR,
+ "Unknown dom: %hu, for PFN=%lx, MFN=%lx\n",
+ g->domain, g->gfn, mfn_x(mfn));
 errors++;
 continue;
 }
 o_mfn = get_gfn_query_unlocked(d, g->gfn, );
 if ( !mfn_eq(o_mfn, mfn) )
 {
-MEM_SHARING_DEBUG("Incorrect P2M for d=%hu, PFN=%lx."
-  "Expecting MFN=%lx, got %lx\n",
-  g->domain, g->gfn, mfn_x(mfn), mfn_x(o_mfn));
+gdprintk(XENLOG_ERR, "Incorrect P2M for d=%hu, PFN=%lx."
+ "Expecting MFN=%lx, got %lx\n",
+ g->domain, g->gfn, mfn_x(mfn), mfn_x(o_mfn));
 errors++;
 }
 if ( t != p2m_ram_shared )
 {
-MEM_SHARING_DEBUG("Incorrect P2M type for d=%hu, PFN=%lx 
MFN=%lx."
-  "Expecting t=%d, got %d\n",
-  g->domain, g->gfn, mfn_x(mfn), 
p2m_ram_shared, t);
+gdprintk(XENLOG_ERR,
+ "Incorrect P2M type for d=%hu, PFN=%lx MFN=%lx."
+ "Expecting t=%d, got %d\n",
+ g->domain, g->gfn, mfn_x(mfn), p2m_ram_shared, t);
 errors++;
 }
 put_domain(d);
@@ -571,10 +572,10 @@ static int audit(void)
 /* The type 

[Xen-devel] [PATCH v2 16/20] x86/mem_sharing: check page type count earlier

2019-12-18 Thread Tamas K Lengyel
Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index b3607b1bce..c44e7f2299 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -649,19 +649,18 @@ static int page_make_sharable(struct domain *d,
 return -EBUSY;
 }
 
-/* Change page type and count atomically */
-if ( !get_page_and_type(page, d, PGT_shared_page) )
+/* Check if page is already typed and bail early if it is */
+if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
 {
 spin_unlock(>page_alloc_lock);
-return -EINVAL;
+return -EEXIST;
 }
 
-/* Check it wasn't already sharable and undo if it was */
-if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
+/* Change page type and count atomically */
+if ( !get_page_and_type(page, d, PGT_shared_page) )
 {
 spin_unlock(>page_alloc_lock);
-put_page_and_type(page);
-return -EEXIST;
+return -EINVAL;
 }
 
 /*
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 08/20] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables

2019-12-18 Thread Tamas K Lengyel
Create struct mem_sharing_domain under hvm_domain and move mem sharing
variables into it from p2m_domain and hvm_domain.

Expose the mem_sharing_enabled macro to be used consistently across Xen.

Remove some duplicate calls to mem_sharing_enabled in mem_sharing.c

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 30 +-
 xen/drivers/passthrough/pci.c |  3 +--
 xen/include/asm-x86/hvm/domain.h  |  6 +-
 xen/include/asm-x86/mem_sharing.h | 16 
 xen/include/asm-x86/p2m.h |  4 
 5 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index c0e305ad71..5d81730315 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -197,9 +197,6 @@ static inline shr_handle_t get_next_handle(void)
 return x + 1;
 }
 
-#define mem_sharing_enabled(d) \
-(is_hvm_domain(d) && (d)->arch.hvm.mem_sharing_enabled)
-
 static atomic_t nr_saved_mfns   = ATOMIC_INIT(0);
 static atomic_t nr_shared_mfns  = ATOMIC_INIT(0);
 
@@ -1300,6 +1297,7 @@ int __mem_sharing_unshare_page(struct domain *d,
 int relinquish_shared_pages(struct domain *d)
 {
 int rc = 0;
+struct mem_sharing_domain *msd = >arch.hvm.mem_sharing;
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
 unsigned long gfn, count = 0;
 
@@ -1307,7 +1305,7 @@ int relinquish_shared_pages(struct domain *d)
 return 0;
 
 p2m_lock(p2m);
-for ( gfn = p2m->next_shared_gfn_to_relinquish;
+for ( gfn = msd->next_shared_gfn_to_relinquish;
   gfn <= p2m->max_mapped_pfn; gfn++ )
 {
 p2m_access_t a;
@@ -1342,7 +1340,7 @@ int relinquish_shared_pages(struct domain *d)
 {
 if ( hypercall_preempt_check() )
 {
-p2m->next_shared_gfn_to_relinquish = gfn + 1;
+msd->next_shared_gfn_to_relinquish = gfn + 1;
 rc = -ERESTART;
 break;
 }
@@ -1428,7 +1426,7 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 
 /* Only HAP is supported */
 rc = -ENODEV;
-if ( !hap_enabled(d) || !d->arch.hvm.mem_sharing_enabled )
+if ( !mem_sharing_enabled(d) )
 goto out;
 
 switch ( mso.op )
@@ -1437,10 +1435,6 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
 shr_handle_t handle;
 
-rc = -EINVAL;
-if ( !mem_sharing_enabled(d) )
-goto out;
-
 rc = nominate_page(d, _gfn(mso.u.nominate.u.gfn), 0, );
 mso.u.nominate.handle = handle;
 }
@@ -1452,9 +1446,6 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 gfn_t gfn;
 shr_handle_t handle;
 
-rc = -EINVAL;
-if ( !mem_sharing_enabled(d) )
-goto out;
 rc = mem_sharing_gref_to_gfn(d->grant_table, gref, , NULL);
 if ( rc < 0 )
 goto out;
@@ -1470,10 +1461,6 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 struct domain *cd;
 shr_handle_t sh, ch;
 
-rc = -EINVAL;
-if ( !mem_sharing_enabled(d) )
-goto out;
-
 rc = rcu_lock_live_remote_domain_by_id(mso.u.share.client_domain,
);
 if ( rc )
@@ -1540,10 +1527,6 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 struct domain *cd;
 shr_handle_t sh;
 
-rc = -EINVAL;
-if ( !mem_sharing_enabled(d) )
-goto out;
-
 rc = rcu_lock_live_remote_domain_by_id(mso.u.share.client_domain,
);
 if ( rc )
@@ -1602,9 +1585,6 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
   mso.u.range.opaque > mso.u.range.last_gfn) )
 goto out;
 
-if ( !mem_sharing_enabled(d) )
-goto out;
-
 rc = rcu_lock_live_remote_domain_by_id(mso.u.range.client_domain,
);
 if ( rc )
@@ -1708,7 +1688,7 @@ int mem_sharing_domctl(struct domain *d, struct 
xen_domctl_mem_sharing_op *mec)
 if ( unlikely(is_iommu_enabled(d) && mec->u.enable) )
 rc = -EXDEV;
 else
-d->arch.hvm.mem_sharing_enabled = mec->u.enable;
+d->arch.hvm.mem_sharing.enabled = mec->u.enable;
 }
 break;
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index c07a63981a..65d1d457ff 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1498,8 +1498,7 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn, u32 flag)
 /* Prevent device 

[Xen-devel] [PATCH v2 17/20] xen/mem_sharing: VM forking

2019-12-18 Thread Tamas K Lengyel
VM forking is the process of creating a domain with an empty memory space and a
parent domain specified from which to populate the memory when necessary. For
the new domain to be functional the VM state is copied over as part of the fork
operation (HVM params, hap allocation, etc).

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/hvm/hvm.c|   2 +-
 xen/arch/x86/mm/mem_sharing.c | 228 ++
 xen/arch/x86/mm/p2m.c |  11 +-
 xen/include/asm-x86/mem_sharing.h |  20 ++-
 xen/include/public/memory.h   |   5 +
 xen/include/xen/sched.h   |   1 +
 6 files changed, 263 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8f90841813..cafd07c67d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1913,7 +1913,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 }
 #endif
 
-/* Spurious fault? PoD and log-dirty also take this path. */
+/* Spurious fault? PoD, log-dirty and VM forking also take this path. */
 if ( p2m_is_ram(p2mt) )
 {
 rc = 1;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index c44e7f2299..e93ad2ec5a 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -22,11 +22,13 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +38,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 
 #include "mm-locks.h"
@@ -1423,6 +1428,200 @@ static inline int mem_sharing_control(struct domain *d, 
bool enable)
 return 0;
 }
 
+/*
+ * Forking a page only gets called when the VM faults due to no entry being
+ * in the EPT for the access. Depending on the type of access we either
+ * populate the physmap with a shared entry for read-only access or
+ * fork the page if its a write access.
+ *
+ * The client p2m is already locked so we only need to lock
+ * the parent's here.
+ */
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
+{
+int rc = -ENOENT;
+shr_handle_t handle;
+struct domain *parent;
+struct p2m_domain *p2m;
+unsigned long gfn_l = gfn_x(gfn);
+mfn_t mfn, new_mfn;
+p2m_type_t p2mt;
+struct page_info *page;
+
+if ( !mem_sharing_is_fork(d) )
+return -ENOENT;
+
+parent = d->parent;
+
+if ( !unsharing )
+{
+/* For read-only accesses we just add a shared entry to the physmap */
+while ( parent )
+{
+if ( !(rc = nominate_page(parent, gfn, 0, )) )
+break;
+
+parent = parent->parent;
+}
+
+if ( !rc )
+{
+/* The client's p2m is already locked */
+struct p2m_domain *pp2m = p2m_get_hostp2m(parent);
+
+p2m_lock(pp2m);
+rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
+p2m_unlock(pp2m);
+
+if ( !rc )
+return 0;
+}
+}
+
+/*
+ * If it's a write access (ie. unsharing) or if adding a shared entry to
+ * the physmap failed we'll fork the page directly.
+ */
+p2m = p2m_get_hostp2m(d);
+parent = d->parent;
+
+while ( parent )
+{
+mfn = get_gfn_query(parent, gfn_l, );
+
+if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) )
+break;
+
+put_gfn(parent, gfn_l);
+parent = parent->parent;
+}
+
+if ( !parent )
+return -ENOENT;
+
+if ( !(page = alloc_domheap_page(d, 0)) )
+{
+put_gfn(parent, gfn_l);
+return -ENOMEM;
+}
+
+new_mfn = page_to_mfn(page);
+copy_domain_page(new_mfn, mfn);
+set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l);
+
+put_gfn(parent, gfn_l);
+
+return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw,
+  p2m->default_access, -1);
+}
+
+static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
+{
+int ret;
+unsigned int i;
+
+if ( (ret = cpupool_move_domain(cd, cpupool)) )
+return ret;
+
+for ( i = 0; i < cd->max_vcpus; i++ )
+{
+if ( cd->vcpu[i] )
+continue;
+
+if ( !vcpu_create(cd, i) )
+return -EINVAL;
+}
+
+domain_update_node_affinity(cd);
+return 0;
+}
+
+static int fork_hap_allocation(struct domain *d, struct domain *cd)
+{
+int rc;
+bool preempted;
+unsigned long mb = hap_get_allocation(d);
+
+if ( mb == hap_get_allocation(cd) )
+return 0;
+
+paging_lock(cd);
+rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), );
+paging_unlock(cd);
+
+if ( rc )
+return rc;
+
+if ( preempted )
+return -ERESTART;
+
+return 0;
+}
+
+static int fork_hvm(struct domain *d, struct domain *cd)
+{
+int rc, i;
+struct hvm_domain_context c = { 0 };
+uint32_t tsc_mode;
+uint32_t gtsc_khz;
+

[Xen-devel] [PATCH v2 06/20] x86/mem_sharing: drop flags from mem_sharing_unshare_page

2019-12-18 Thread Tamas K Lengyel
All callers pass 0 in.

Signed-off-by: Tamas K Lengyel 
Reviewed-by: Wei Liu 
---
 xen/arch/x86/hvm/hvm.c| 2 +-
 xen/arch/x86/mm/p2m.c | 5 ++---
 xen/include/asm-x86/mem_sharing.h | 8 +++-
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1e888b403b..e055114922 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1902,7 +1902,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 if ( npfec.write_access && (p2mt == p2m_ram_shared) )
 {
 ASSERT(p2m_is_hostp2m(p2m));
-sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0);
+sharing_enomem = mem_sharing_unshare_page(currd, gfn);
 rc = 1;
 goto out_put_gfn;
 }
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 3119269073..baea632acc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -515,7 +515,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, 
unsigned long gfn_l,
  * Try to unshare. If we fail, communicate ENOMEM without
  * sleeping.
  */
-if ( mem_sharing_unshare_page(p2m->domain, gfn_l, 0) < 0 )
+if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 )
 mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
 mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 }
@@ -896,8 +896,7 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t 
mfn,
 {
 /* Do an unshare to cleanly take care of all corner cases. */
 int rc;
-rc = mem_sharing_unshare_page(p2m->domain,
-  gfn_x(gfn_add(gfn, i)), 0);
+rc = mem_sharing_unshare_page(p2m->domain, gfn_x(gfn_add(gfn, i)));
 if ( rc )
 {
 p2m_unlock(p2m);
diff --git a/xen/include/asm-x86/mem_sharing.h 
b/xen/include/asm-x86/mem_sharing.h
index 7d40e38563..0a9192d0e2 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -70,10 +70,9 @@ int __mem_sharing_unshare_page(struct domain *d,
 
 static inline
 int mem_sharing_unshare_page(struct domain *d,
- unsigned long gfn,
- uint16_t flags)
+ unsigned long gfn)
 {
-int rc = __mem_sharing_unshare_page(d, gfn, flags);
+int rc = __mem_sharing_unshare_page(d, gfn, 0);
 BUG_ON(rc && (rc != -ENOMEM));
 return rc;
 }
@@ -117,8 +116,7 @@ static inline unsigned int 
mem_sharing_get_nr_shared_mfns(void)
 }
 
 static inline
-int mem_sharing_unshare_page(struct domain *d, unsigned long gfn,
- uint16_t flags)
+int mem_sharing_unshare_page(struct domain *d, unsigned long gfn)
 {
 ASSERT_UNREACHABLE();
 return -EOPNOTSUPP;
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations

2019-12-18 Thread Tamas K Lengyel
No functional changes.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/hvm/hvm.c|  11 +-
 xen/arch/x86/mm/mem_sharing.c | 342 +-
 xen/arch/x86/mm/p2m.c |  17 +-
 xen/include/asm-x86/mem_sharing.h |  51 +++--
 4 files changed, 236 insertions(+), 185 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5a3a962fbb..1e888b403b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1902,12 +1902,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
long gla,
 if ( npfec.write_access && (p2mt == p2m_ram_shared) )
 {
 ASSERT(p2m_is_hostp2m(p2m));
-sharing_enomem = 
-(mem_sharing_unshare_page(currd, gfn, 0) < 0);
+sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0);
 rc = 1;
 goto out_put_gfn;
 }
- 
+
 /* Spurious fault? PoD and log-dirty also take this path. */
 if ( p2m_is_ram(p2mt) )
 {
@@ -1953,9 +1952,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 __put_gfn(p2m, gfn);
 __put_gfn(hostp2m, gfn);
  out:
-/* All of these are delayed until we exit, since we might 
+/*
+ * All of these are delayed until we exit, since we might
  * sleep on event ring wait queues, and we must not hold
- * locks in such circumstance */
+ * locks in such circumstance.
+ */
 if ( paged )
 p2m_mem_paging_populate(currd, gfn);
 if ( sharing_enomem )
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index efb8821768..319aaf3074 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -59,8 +59,10 @@ static DEFINE_PER_CPU(pg_lock_data_t, __pld);
 #define RMAP_USES_HASHTAB(page) \
 ((page)->sharing->hash_table.flag == NULL)
 #define RMAP_HEAVY_SHARED_PAGE   RMAP_HASHTAB_SIZE
-/* A bit of hysteresis. We don't want to be mutating between list and hash
- * table constantly. */
+/*
+ * A bit of hysteresis. We don't want to be mutating between list and hash
+ * table constantly.
+ */
 #define RMAP_LIGHT_SHARED_PAGE   (RMAP_HEAVY_SHARED_PAGE >> 2)
 
 #if MEM_SHARING_AUDIT
@@ -88,7 +90,7 @@ static inline void page_sharing_dispose(struct page_info 
*page)
 {
 /* Unlikely given our thresholds, but we should be careful. */
 if ( unlikely(RMAP_USES_HASHTAB(page)) )
-free_xenheap_pages(page->sharing->hash_table.bucket, 
+free_xenheap_pages(page->sharing->hash_table.bucket,
 RMAP_HASHTAB_ORDER);
 
 spin_lock(_audit_lock);
@@ -105,7 +107,7 @@ static inline void page_sharing_dispose(struct page_info 
*page)
 {
 /* Unlikely given our thresholds, but we should be careful. */
 if ( unlikely(RMAP_USES_HASHTAB(page)) )
-free_xenheap_pages(page->sharing->hash_table.bucket, 
+free_xenheap_pages(page->sharing->hash_table.bucket,
 RMAP_HASHTAB_ORDER);
 xfree(page->sharing);
 }
@@ -122,8 +124,8 @@ static inline void page_sharing_dispose(struct page_info 
*page)
  * Nesting may happen when sharing (and locking) two pages.
  * Deadlock is avoided by locking pages in increasing order.
  * All memory sharing code paths take the p2m lock of the affected gfn before
- * taking the lock for the underlying page. We enforce ordering between 
page_lock
- * and p2m_lock using an mm-locks.h construct.
+ * taking the lock for the underlying page. We enforce ordering between
+ * page_lock and p2m_lock using an mm-locks.h construct.
  *
  * TODO: Investigate if PGT_validated is necessary.
  */
@@ -168,7 +170,7 @@ static inline bool mem_sharing_page_lock(struct page_info 
*pg)
 if ( rc )
 {
 preempt_disable();
-page_sharing_mm_post_lock(>mm_unlock_level, 
+page_sharing_mm_post_lock(>mm_unlock_level,
   >recurse_count);
 }
 return rc;
@@ -178,7 +180,7 @@ static inline void mem_sharing_page_unlock(struct page_info 
*pg)
 {
 pg_lock_data_t *pld = &(this_cpu(__pld));
 
-page_sharing_mm_unlock(pld->mm_unlock_level, 
+page_sharing_mm_unlock(pld->mm_unlock_level,
>recurse_count);
 preempt_enable();
 _page_unlock(pg);
@@ -186,7 +188,7 @@ static inline void mem_sharing_page_unlock(struct page_info 
*pg)
 
 static inline shr_handle_t get_next_handle(void)
 {
-/* Get the next handle get_page style */ 
+/* Get the next handle get_page style */
 uint64_t x, y = next_handle;
 do {
 x = y;
@@ -198,24 +200,26 @@ static inline shr_handle_t get_next_handle(void)
 #define mem_sharing_enabled(d) \
 (is_hvm_domain(d) && (d)->arch.hvm.mem_sharing_enabled)
 
-static atomic_t nr_saved_mfns   = ATOMIC_INIT(0); 
+static atomic_t nr_saved_mfns   = ATOMIC_INIT(0);
 static atomic_t nr_shared_mfns  = ATOMIC_INIT(0);
 
-/** Reverse map **/
-/* Every shared frame keeps a reverse map (rmap) of  tuples that
+/*
+ * Reverse map
+ *
+ * Every 

[Xen-devel] [PATCH v2 00/20] VM forking

2019-12-18 Thread Tamas K Lengyel
The following series implements VM forking for Intel HVM guests to allow for
the fast creation of identical VMs without the assosciated high startup costs
of booting or restoring the VM from a savefile.

JIRA issue: https://xenproject.atlassian.net/browse/XEN-89

The main design goal with this series has been to reduce the time of creating
the VM fork as much as possible. To achieve this the VM forking process is
split into two steps:
1) forking the VM on the hypervisor side;
2) starting QEMU to handle the backed for emulated devices.

Step 1) involves creating a VM using the new "xl fork-vm" command. The
parent VM is expected to remain paused after forks are created from it (which
is different then what process forking normally entails). During this forking
operation the HVM context and VM settings are copied over to the new forked VM.
This operation is fast and it allows the forked VM to be unpaused and to be
monitored and accessed via VMI. Note however that without its device model
running (depending on what is executing in the VM) it is bound to
misbehave/crash when its trying to access devices that would be emulated by
QEMU. We anticipate that for certain use-cases this would be an acceptable
situation, in case for example when fuzzing is performed of code segments that
don't access such devices.

Step 2) involves launching QEMU to support the forked VM, which requires the
QEMU Xen savefile to be generated manually from the parent VM. This can be
accomplished simply by connecting to its QMP socket and issuing the
"xen-save-devices-state" command as documented by QEMU:
https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt
Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is
used to launch QEMU and load the specified savefile for it.

At runtime the forked VM starts running with an empty p2m which gets lazily
populated when the VM generates EPT faults, similar to how altp2m views are
populated. If the memory access is a read-only access, the p2m entry is
populated with a memory shared entry with its parent. For write memory accesses
or in case memory sharing wasn't possible (for example in case a reference is
held by a third party), a new page is allocated and the page contents are
copied over from the parent VM. Forks can be further forked if needed, thus
allowing for further memory savings.

A VM fork reset hypercall is also added that allows the fork to be reset to the
state it was just after a fork. This is an optimization for cases where the
forks are very short-lived and run without a device model, so resetting saves
some time compared to creating a brand new fork.

The series has been tested with both Linux and Windows VMs and functions as
expected. VM forking time has been measured to be 0.018s, device model launch
to be around 1s depending largely on the number of devices being emulated.

Patches 1-2 implement changes to existing internal Xen APIs to make VM forking
possible.

Patches 3-4 are simple code-formatting fixes for the toolstack and Xen for the
memory sharing paths with no functional changes.

Patches 5-16 are code-cleanups and adjustments of to Xen memory sharing
subsystem with no functional changes.

Patch 17 adds the hypervisor-side code implementing VM forking.

Patch 18 is integration of mem_access with forked VMs.

Patch 19 implements the VM fork reset operation hypervisor side bits.

Patch 20 adds the toolstack-side code implementing VM forking and reset.

Tamas K Lengyel (20):
  x86: make hvm_{get/set}_param accessible
  xen/x86: Make hap_get_allocation accessible
  tools/libxc: clean up memory sharing files
  x86/mem_sharing: cleanup code and comments in various locations
  x86/mem_sharing: make get_two_gfns take locks conditionally
  x86/mem_sharing: drop flags from mem_sharing_unshare_page
  x86/mem_sharing: don't try to unshare twice during page fault
  x86/mem_sharing: define mem_sharing_domain to hold some scattered
variables
  x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in
relinquish_shared_pages
  x86/mem_sharing: Make add_to_physmap static and shorten name
  x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
  x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk
  x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  x86/mem_sharing: Enable mem_sharing on first memop
  x86/mem_sharing: Skip xen heap pages in memshr nominate
  x86/mem_sharing: check page type count earlier
  xen/mem_sharing: VM forking
  xen/mem_access: Use __get_gfn_type_access in set_mem_access
  x86/mem_sharing: reset a fork
  xen/tools: VM forking toolstack side

 tools/libxc/include/xenctrl.h |  30 +-
 tools/libxc/xc_memshr.c   |  34 +-
 tools/libxl/libxl.h   |   7 +
 tools/libxl/libxl_create.c| 237 +---
 tools/libxl/libxl_dm.c|   2 +-
 tools/libxl/libxl_dom.c   |  83 ++-
 tools/libxl/libxl_internal.h  |   1 +
 tools/libxl/libxl_types.idl   |   1 +
 

[Xen-devel] [PATCH v2 03/20] tools/libxc: clean up memory sharing files

2019-12-18 Thread Tamas K Lengyel
No functional changes.

Signed-off-by: Tamas K Lengyel 
Acked-by: Wei Liu 
---
 tools/libxc/include/xenctrl.h | 24 
 tools/libxc/xc_memshr.c   | 12 ++--
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index f4431687b3..b5ffa53d55 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2060,7 +2060,7 @@ int xc_monitor_emulate_each_rep(xc_interface *xch, 
uint32_t domain_id,
  *
  * Sharing is supported only on the x86 architecture in 64 bit mode, with
  * Hardware-Assisted Paging (i.e. Intel EPT, AMD NPT). Moreover, AMD NPT
- * support is considered experimental. 
+ * support is considered experimental.
 
  * Calls below return ENOSYS if not in the x86_64 architecture.
  * Calls below return ENODEV if the domain does not support HAP.
@@ -2107,13 +2107,13 @@ int xc_memshr_control(xc_interface *xch,
  *  EINVAL or EACCESS if the request is denied by the security policy
  */
 
-int xc_memshr_ring_enable(xc_interface *xch, 
+int xc_memshr_ring_enable(xc_interface *xch,
   uint32_t domid,
   uint32_t *port);
 /* Disable the ring for ENOMEM communication.
  * May fail with EINVAL if the ring was not enabled in the first place.
  */
-int xc_memshr_ring_disable(xc_interface *xch, 
+int xc_memshr_ring_disable(xc_interface *xch,
uint32_t domid);
 
 /*
@@ -2126,7 +2126,7 @@ int xc_memshr_ring_disable(xc_interface *xch,
 int xc_memshr_domain_resume(xc_interface *xch,
 uint32_t domid);
 
-/* Select a page for sharing. 
+/* Select a page for sharing.
  *
  * A 64 bit opaque handle will be stored in handle.  The hypervisor ensures
  * that if the page is modified, the handle will be invalidated, and future
@@ -2155,7 +2155,7 @@ int xc_memshr_nominate_gref(xc_interface *xch,
 
 /* The three calls below may fail with
  * 10 (or -XENMEM_SHARING_OP_S_HANDLE_INVALID) if the handle passed as source
- * is invalid.  
+ * is invalid.
  * 9 (or -XENMEM_SHARING_OP_C_HANDLE_INVALID) if the handle passed as client is
  * invalid.
  */
@@ -2168,7 +2168,7 @@ int xc_memshr_nominate_gref(xc_interface *xch,
  *
  * After successful sharing, the client handle becomes invalid. Both  tuples point to the same mfn with the same handle, the one specified as
- * source. Either 3-tuple can be specified later for further re-sharing. 
+ * source. Either 3-tuple can be specified later for further re-sharing.
  */
 int xc_memshr_share_gfns(xc_interface *xch,
 uint32_t source_domain,
@@ -2193,7 +2193,7 @@ int xc_memshr_share_grefs(xc_interface *xch,
 /* Allows to add to the guest physmap of the client domain a shared frame
  * directly.
  *
- * May additionally fail with 
+ * May additionally fail with
  *  9 (-XENMEM_SHARING_OP_C_HANDLE_INVALID) if the physmap entry for the gfn is
  *  not suitable.
  *  ENOMEM if internal data structures cannot be allocated.
@@ -,7 +,7 @@ int xc_memshr_range_share(xc_interface *xch,
   uint64_t last_gfn);
 
 /* Debug calls: return the number of pages referencing the shared frame backing
- * the input argument. Should be one or greater. 
+ * the input argument. Should be one or greater.
  *
  * May fail with EINVAL if there is no backing shared frame for the input
  * argument.
@@ -2235,9 +2235,9 @@ int xc_memshr_debug_gref(xc_interface *xch,
  uint32_t domid,
  grant_ref_t gref);
 
-/* Audits the share subsystem. 
- * 
- * Returns ENOSYS if not supported (may not be compiled into the hypervisor). 
+/* Audits the share subsystem.
+ *
+ * Returns ENOSYS if not supported (may not be compiled into the hypervisor).
  *
  * Returns the number of errors found during auditing otherwise. May be (should
  * be!) zero.
@@ -2273,7 +2273,7 @@ long xc_sharing_freed_pages(xc_interface *xch);
  * should return 1. (And dominfo(d) for each of the two domains should return 1
  * as well).
  *
- * Note that some of these sharing_used_frames may be referenced by 
+ * Note that some of these sharing_used_frames may be referenced by
  * a single domain page, and thus not realize any savings. The same
  * applies to some of the pages counted in dominfo(d)->shr_pages.
  */
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index d5e135e0d9..5ef56a6933 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -41,7 +41,7 @@ int xc_memshr_control(xc_interface *xch,
 return do_domctl(xch, );
 }
 
-int xc_memshr_ring_enable(xc_interface *xch, 
+int xc_memshr_ring_enable(xc_interface *xch,
   uint32_t domid,
   uint32_t *port)
 {
@@ -57,7 +57,7 @@ int xc_memshr_ring_enable(xc_interface *xch,
port);
 }
 
-int xc_memshr_ring_disable(xc_interface *xch, 
+int xc_memshr_ring_disable(xc_interface *xch,
  

[Xen-devel] [PATCH v2 10/20] x86/mem_sharing: Make add_to_physmap static and shorten name

2019-12-18 Thread Tamas K Lengyel
It's not being called from outside mem_sharing.c

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/mm/mem_sharing.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 1b7b520ccf..fc1d8be1eb 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1064,8 +1064,9 @@ err_out:
 return ret;
 }
 
-int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, 
shr_handle_t sh,
-   struct domain *cd, unsigned long cgfn, bool 
lock)
+static
+int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
+   struct domain *cd, unsigned long cgfn, bool lock)
 {
 struct page_info *spage;
 int ret = -EINVAL;
@@ -1558,7 +1559,7 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 sh  = mso.u.share.source_handle;
 cgfn= mso.u.share.client_gfn;
 
-rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn, true);
+rc = add_to_physmap(d, sgfn, sh, cd, cgfn, true);
 
 rcu_unlock_domain(cd);
 }
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible

2019-12-18 Thread Tamas K Lengyel
Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
exposing hvm_{get/set}_param it will be possible for VM forking to copy the
parameters directly into the clone domain.

Signed-off-by: Tamas K Lengyel 
---
 xen/arch/x86/hvm/hvm.c| 169 --
 xen/include/asm-x86/hvm/hvm.h |   4 +
 2 files changed, 106 insertions(+), 67 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 614ed60fe4..5a3a962fbb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4072,16 +4072,17 @@ static int hvmop_set_evtchn_upcall_vector(
 }
 
 static int hvm_allow_set_param(struct domain *d,
-   const struct xen_hvm_param *a)
+   uint32_t index,
+   uint64_t new_value)
 {
-uint64_t value = d->arch.hvm.params[a->index];
+uint64_t value = d->arch.hvm.params[index];
 int rc;
 
 rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param);
 if ( rc )
 return rc;
 
-switch ( a->index )
+switch ( index )
 {
 /* The following parameters can be set by the guest. */
 case HVM_PARAM_CALLBACK_IRQ:
@@ -4114,7 +4115,7 @@ static int hvm_allow_set_param(struct domain *d,
 if ( rc )
 return rc;
 
-switch ( a->index )
+switch ( index )
 {
 /* The following parameters should only be changed once. */
 case HVM_PARAM_VIRIDIAN:
@@ -4124,7 +4125,7 @@ static int hvm_allow_set_param(struct domain *d,
 case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
 case HVM_PARAM_ALTP2M:
 case HVM_PARAM_MCA_CAP:
-if ( value != 0 && a->value != value )
+if ( value != 0 && new_value != value )
 rc = -EEXIST;
 break;
 default:
@@ -4134,13 +4135,11 @@ static int hvm_allow_set_param(struct domain *d,
 return rc;
 }
 
-static int hvmop_set_param(
+int hvmop_set_param(
 XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
 {
-struct domain *curr_d = current->domain;
 struct xen_hvm_param a;
 struct domain *d;
-struct vcpu *v;
 int rc;
 
 if ( copy_from_guest(, arg, 1) )
@@ -4160,23 +4159,42 @@ static int hvmop_set_param(
 if ( !is_hvm_domain(d) )
 goto out;
 
-rc = hvm_allow_set_param(d, );
+rc = hvm_set_param(d, a.index, a.value);
+
+ out:
+rcu_unlock_domain(d);
+return rc;
+}
+
+int hvm_set_param(
+struct domain *d,
+uint32_t index,
+uint64_t value)
+{
+struct domain *curr_d = current->domain;
+int rc;
+struct vcpu *v;
+
+if ( index >= HVM_NR_PARAMS )
+return -EINVAL;
+
+rc = hvm_allow_set_param(d, index, value);
 if ( rc )
 goto out;
 
-switch ( a.index )
+switch ( index )
 {
 case HVM_PARAM_CALLBACK_IRQ:
-hvm_set_callback_via(d, a.value);
+hvm_set_callback_via(d, value);
 hvm_latch_shinfo_size(d);
 break;
 case HVM_PARAM_TIMER_MODE:
-if ( a.value > HVMPTM_one_missed_tick_pending )
+if ( value > HVMPTM_one_missed_tick_pending )
 rc = -EINVAL;
 break;
 case HVM_PARAM_VIRIDIAN:
-if ( (a.value & ~HVMPV_feature_mask) ||
- !(a.value & HVMPV_base_freq) )
+if ( (value & ~HVMPV_feature_mask) ||
+ !(value & HVMPV_base_freq) )
 rc = -EINVAL;
 break;
 case HVM_PARAM_IDENT_PT:
@@ -4186,7 +4204,7 @@ static int hvmop_set_param(
  */
 if ( !paging_mode_hap(d) || !cpu_has_vmx )
 {
-d->arch.hvm.params[a.index] = a.value;
+d->arch.hvm.params[index] = value;
 break;
 }
 
@@ -4201,7 +4219,7 @@ static int hvmop_set_param(
 
 rc = 0;
 domain_pause(d);
-d->arch.hvm.params[a.index] = a.value;
+d->arch.hvm.params[index] = value;
 for_each_vcpu ( d, v )
 paging_update_cr3(v, false);
 domain_unpause(d);
@@ -4210,23 +4228,23 @@ static int hvmop_set_param(
 break;
 case HVM_PARAM_DM_DOMAIN:
 /* The only value this should ever be set to is DOMID_SELF */
-if ( a.value != DOMID_SELF )
+if ( value != DOMID_SELF )
 rc = -EINVAL;
 
-a.value = curr_d->domain_id;
+value = curr_d->domain_id;
 break;
 case HVM_PARAM_ACPI_S_STATE:
 rc = 0;
-if ( a.value == 3 )
+if ( value == 3 )
 hvm_s3_suspend(d);
-else if ( a.value == 0 )
+else if ( value == 0 )
 hvm_s3_resume(d);
 else
 rc = -EINVAL;
 
 break;
 case HVM_PARAM_ACPI_IOPORTS_LOCATION:
-rc = pmtimer_change_ioport(d, a.value);
+rc = pmtimer_change_ioport(d, value);
 break;
 case HVM_PARAM_MEMORY_EVENT_CR0:
 case HVM_PARAM_MEMORY_EVENT_CR3:
@@ -4241,24 +4259,24 @@ static int hvmop_set_param(
 rc = xsm_hvm_param_nested(XSM_PRIV, d);
 if ( rc )
 break;
-if ( 

Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-18 Thread Wei Liu
On Wed, Dec 18, 2019 at 04:09:25PM +, Paul Durrant wrote:
> ...for patches not (yet) upstream.
> 
> This patch is simply reserving save record number space to avoid the
> risk of clashes between existent downstream changes made by Amazon and
> future upstream changes which may be incompatible.
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ANNOUNCEMENT] Xen 4.13 is released

2019-12-18 Thread Sander Eikelenboom
On 18/12/2019 18:00, Juergen Gross wrote:
> Dear community members,
> 
> I'm pleased to announce that Xen 4.13.0 is released.
>  
> Thanks everyone who contributed to this release. This release would
> not have happened without all the awesome contributions from around
> the globe.
> 
> Regards,
> 
> Juergen Gross (on behalf of the Xen Project Hypervisor team)

Thanks for your work as release manager !

--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v13 5/5] xen/blkback: Consistently insert one empty line between functions

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.

Acked-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 24172c180f5f..c7f820db190a 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
device_remove_file(>dev, _attr_physical_device);
 }
 
-
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
if (vbd->bdev)
@@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
handle, blkif->domid);
return 0;
 }
+
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(>dev);
@@ -572,6 +572,7 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
if (err)
dev_warn(>dev, "writing feature-discard (%d)", err);
 }
+
 int xen_blkbk_barrier(struct xenbus_transaction xbt,
  struct backend_info *be, int state)
 {
@@ -656,7 +657,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
-
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -748,7 +748,6 @@ static void backend_changed(struct xenbus_watch *watch,
}
 }
 
-
 /*
  * Callback received when the frontend's state changes.
  */
@@ -823,7 +822,6 @@ static void frontend_changed(struct xenbus_device *dev,
}
 }
 
-
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
 static unsigned int buffer_squeeze_duration_ms = 10;
 module_param_named(buffer_squeeze_duration_ms,
@@ -846,7 +844,6 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
-
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v13 4/5] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
continue;
}
if (use_persistent_gnts &&
-   ring->persistent_gnt_c < xen_blkif_max_pgrants) {
+   

[Xen-devel] [PATCH v13 0/5] xenbus/backend: Add memory pressure handler callback

2019-12-18 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and then introduce a lock for race
condition avoidance (patch 2).  After that, patch 3 applies the callback
mechanism to mitigate the problem in 'xen-blkback'.  The fourth and
fifth patches are trivial cleanups; those fix nits we found during the
development of this patchset.

Note that patches 1, 4, and 5 are not changed since v9.


Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v13


Patch History
-

Changes from v12
(https://lore.kernel.org/xen-devel/20191218104232.9606-1-sjp...@amazon.com/)
 - Do not unnecessarily disable interrupts (suggested by Juergen)
 - Hold lock from xenbus side (suggested by Juergen)

Changes from v11
(https://lore.kernel.org/xen-devel/20191217160748.693-2-sjp...@amazon.com/)
 - Fix wrong trylock use (reported by Juergen)
 - Merge patch 3 and 4 (suggested by Juergen)
 - Update test result

Changes from v10
(https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/)
 - Fix race condition (reported by SeongJae, suggested by Juergen)

Changes from v9
(https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/)
 - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné
 - Update the commit message for overhead test of the 2nd path

Changes from v8
(https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
 - Drop 'Reviewed-by: Juergen' from the second patch
   (suggested by Roger Pau Monné)
 - Update contact of the new module param to SeongJae Park
   
   (suggested by Roger Pau Monné)
 - Wordsmith the description of the parameter
   (suggested by Roger Pau Monné)
 - Fix dumb bugs
   (suggested by Roger Pau Monné)
 - Move module param definition to xenbus.c and reduce the number of
   lines for this change
   (suggested by Roger Pau Monné)
 - Add a comment for the new callback, reclaim_memory, as other
   callbacks also have
 - Add another trivial cleanup of xenbus.c file (4th patch)

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau
   Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen
   Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch


SeongJae Park (5):
  xenbus/backend: Add memory pressure handler callback
  xenbus/backend: Protect xenbus callback with lock
  xen/blkback: Squeeze page pools if a memory pressure is detected
  xen/blkback: Remove unnecessary static variable name prefixes
  xen/blkback: Consistently insert one empty line between functions

 .../ABI/testing/sysfs-driver-xen-blkback  | 10 

[Xen-devel] [PATCH v13 1/5] xenbus/backend: Add memory pressure handler callback

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(_notifier);
 
+   if (register_shrinker(_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v13 4/5] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
continue;
}
if (use_persistent_gnts &&
-   ring->persistent_gnt_c < xen_blkif_max_pgrants) {
+   

[Xen-devel] [PATCH v13 3/5] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  8673,967

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   707 5,095
10  867 3,967
100 362 3,348

As expected, the memory pressure decreases as the duration increases,
but the reduction become slow from the `10ms`.  Based on this results, I
chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

# xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

$ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
 bs=4k count=$((256*512)); sync; done

The results are as below.  'max_pgs' represents the value of the
`blkback.max_buffer_pages` parameter.

max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing on 

[Xen-devel] [PATCH v13 2/5] xenbus/backend: Protect xenbus callback with lock

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

A driver's 'reclaim_memory' callback can race with 'probe' or 'remove'
because it will be called whenever memory pressure is detected.  To
avoid such race, this commit embeds a spinlock in each 'xenbus_device'
and make 'xenbus' to hold the lock while the corresponded callbacks are
running.

Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe.c |  8 +++-
 drivers/xen/xenbus/xenbus_probe_backend.c | 10 --
 include/xen/xenbus.h  |  1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 5b471889d723..9ed556ba4fd4 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -232,7 +232,9 @@ int xenbus_dev_probe(struct device *_dev)
return err;
}
 
+   spin_lock(>reclaim_lock);
err = drv->probe(dev, id);
+   spin_unlock(>reclaim_lock);
if (err)
goto fail;
 
@@ -260,8 +262,11 @@ int xenbus_dev_remove(struct device *_dev)
 
free_otherend_watch(dev);
 
-   if (drv->remove)
+   if (drv->remove) {
+   spin_lock(>reclaim_lock);
drv->remove(dev);
+   spin_unlock(>reclaim_lock);
+   }
 
free_otherend_details(dev);
 
@@ -472,6 +477,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
goto fail;
 
dev_set_name(>dev, "%s", devname);
+   spin_lock_init(>reclaim_lock);
 
/* Register with generic device framework. */
err = device_register(>dev);
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index 7e78ebef7c54..bc61372e00a1 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
 static int backend_reclaim_memory(struct device *dev, void *data)
 {
const struct xenbus_driver *drv;
+   struct xenbus_device *xdev;
 
if (!dev->driver)
return 0;
drv = to_xenbus_driver(dev->driver);
-   if (drv && drv->reclaim_memory)
-   drv->reclaim_memory(to_xenbus_device(dev));
+   if (drv && drv->reclaim_memory) {
+   xdev = to_xenbus_device(dev);
+   if (!spin_trylock(>reclaim_lock))
+   return 0;
+   drv->reclaim_memory(xdev);
+   spin_unlock(>reclaim_lock);
+   }
return 0;
 }
 
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index c861cfb6f720..45cd61cb6e86 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -76,6 +76,7 @@ struct xenbus_device {
enum xenbus_state state;
struct completion down;
struct work_struct work;
+   spinlock_t reclaim_lock;
 };
 
 static inline struct xenbus_device *to_xenbus_device(struct device *dev)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 144934: tolerable all pass - PUSHED

2019-12-18 Thread osstest service owner
flight 144934 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144934/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  5c13ed79f3cba200f21e7dfd6ed7f3aa08e4dada
baseline version:
 xen  0e7c69bd3c0b35a677d73843b39522787ccf5a3f

Last test of basis   144931  2019-12-18 12:00:25 Z0 days
Testing same since   144934  2019-12-18 15:01:21 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   0e7c69bd3c..5c13ed79f3  5c13ed79f3cba200f21e7dfd6ed7f3aa08e4dada -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-mainline test] 144925: regressions - FAIL

2019-12-18 Thread osstest service owner
flight 144925 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144925/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-freebsd10-i386 14 guest-saverestore  fail REGR. vs. 144861
 test-amd64-i386-freebsd10-amd64 14 guest-saverestore fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 13 guest-saverestore fail REGR. vs. 
144861
 test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 guest-saverestore fail 
REGR. vs. 144861
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 guest-saverestore fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 13 guest-saverestore fail REGR. 
vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 13 guest-saverestore fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 13 guest-saverestore fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64 13 guest-saverestore fail REGR. vs. 
144861
 test-amd64-i386-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 13 guest-saverestore fail REGR. 
vs. 144861
 test-amd64-i386-xl-qemuu-win7-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ws16-amd64 13 guest-saverestore fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail REGR. vs. 144861

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144861

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail  like 144861
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144861
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144861
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 

Re: [Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock

2019-12-18 Thread SeongJae Park
On Wed, 18 Dec 2019 16:11:51 +0100 "Jürgen Groß"  wrote:

> On 18.12.19 15:40, SeongJae Park wrote:
> > On Wed, 18 Dec 2019 14:30:44 +0100 "Jürgen Groß"  wrote:
> > 
> >> On 18.12.19 13:42, SeongJae Park wrote:
> >>> On Wed, 18 Dec 2019 13:27:37 +0100 "Jürgen Groß"  wrote:
> >>>
>  On 18.12.19 11:42, SeongJae Park wrote:
> > From: SeongJae Park 
> >
> > 'reclaim_memory' callback can race with a driver code as this callback
> > will be called from any memory pressure detected context.  To deal with
> > the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
> > 'reclaim_memory' callback is called, the lock of the device which passed
> > to the callback as its argument is locked.  Thus, drivers registering
> > their 'reclaim_memory' callback should protect the data that might race
> > with the callback with the lock by themselves.
> 
>  Any reason you don't take the lock around the .probe() and .remove()
>  calls of the backend (xenbus_dev_probe() and xenbus_dev_remove())? This
>  would eliminate the need to do that in each backend instead.
> >>>
> >>> First of all, I would like to keep the critical section as small as 
> >>> possible.
> >>> With my small test, I could see slightly increasing memory pressure as the
> >>> critical section becomes wider.  Also, some drivers might share the data 
> >>> their
> >>> 'reclaim_memory' callback touches with other functions.  I think only the
> >>> driver owners can know what data is shared and what is the minimum 
> >>> critical
> >>> section to protect it.
> >>
> >> But this kind of serialization can still be added on top.
> > 
> > I'm still worrying about the unnecessarily large critical section, but it 
> > might
> > be small enough to be ignored.  If no others have strong objection, I will 
> > take
> > the lock around the '->probe()' and '->remove()'.
> 
> The lock is per device, so contention is possible only for the
> reclaim case. In case probe or remove are running reclaim will have
> nothing to free (in probe case nothing is allocated yet, in remove
> case everything should be freed anyway). So the larger critical section
> is no problem at all IMO.

Agreed.  I think I was worried about nothing really existing now.

> 
> >> And with the trylock in the reclaim path I believe you can even avoid
> >> the irq variants of the spinlock. But I might be wrong, so you should
> >> try that with lockdep enabled. If it is working there is no harm done
> >> when making the critical section larger, as memory allocations will
> >> work as before.
> > 
> > Yes, you're right.  I will try test with lockdep.
> 
> Thanks,

Good news, lockdep says it's okay :)

Will post next version soon.


Thanks,
SeongJae Park

> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM

2019-12-18 Thread Roman Shaposhnik
Hi,

On Wed, Dec 18, 2019 at 4:56 AM Julien Grall  wrote:
> > So that is, in fact, my first question -- why is Xen not showing
> > available memory in xl info?
>
> I am not entirely sure what exact information you want.
>
> The output you dumped above contain the available memory for the memory
> (see "free_memory").
>
> Are you looking from something different?

Just to be clear: I was giving 2G via devicetrees (the same device
trees that would
make Linux detect 2G of RAM) hence I was expecting xl info to show that. Instead
I only got 1120M shown by xl info.

> On 18/12/2019 00:04, Roman Shaposhnik wrote:
> >  memory {
> >  device_type = "memory";
> >  reg = <0x0 0x0 0x0 0x5e0 0x0 0x5f0 0x0 0x1000
> > 0x0 0x5f02000 0x0 0xefd000 0x0 0x6e0 0x0 0x60f000 0x0 0x741
> > 0x0 0x1aaf 0x0 0x21f0 0x0 0x10 0x0 0x2200 0x0
> > 0x1c00>;
> >  };
> >
> >  reserved-memory {
> >  ranges;
> >  #size-cells = <0x2>;
> >  #address-cells = <0x2>;
> >
> >  ramoops@21f0 {
> >  ftrace-size = <0x2>;
> >  console-size = <0x2>;
> >  reg = <0x0 0x21f0 0x0 0x10>;
> >  record-size = <0x2>;
> >  compatible = "ramoops";
> >  };
> >
> >  linux,cma {
> >  linux,cma-default;
> >  reusable;
> >  size = <0x0 0x800>;
> >  compatible = "shared-dma-pool";
> >  };
> >  };
> >
> > If you look at the REG -- it does now add up to 2Gb, but booting Xen
> > with it has exactly the
> > same effect as booting it with: reg = <0x0 0x0 0x0 0x8000>;\
>
> If you boot Xen using EFI, the memory information wil come from EFI and
> the DT node will be ignored. So unless UEFI is able to pick up the
> modification of the DT memory node, modifying the DT is not going to
> affect anything.

That's a good point, but given that I always go through GRUB, I was
expecting devicetree command to completely overshadow whatever
information UEFI may have. Am I wrong?

> > I am attaching a full log, and I see the following in the logs:
> >
> > (XEN) Allocating 1:1 mappings totalling 720MB for dom0:
> > (XEN) BANK[0] 0x000800-0x001c00 (320MB)
> > (XEN) BANK[1] 0x004000-0x005800 (384MB)
> > (XEN) BANK[2] 0x007b00-0x007c00 (16MB)
> >
> > Which sort of makes sense, I guess -- but I still don't understand
> > where all these ranges
> > are coming from and how come Xen doesn't see the full 2Gb even with various
> > devicetrees I tried.
>
> The range aboves describe the memory range given to Dom0. For all the
> memory given to Xen,m you want to look at the top of your log:
>
> (XEN) Checking for initrd in /chosen
> (XEN) RAM:  - 05df
> (XEN) RAM: 05f0 - 06dfefff
> (XEN) RAM: 06e0 - 0740efff
> (XEN) RAM: 0741 - 1db8dfff
> (XEN) RAM: 350f - 3dbd2fff
> (XEN) RAM: 3dbd3000 - 3dff
> (XEN) RAM: 4000 - 5a653fff
> (XEN) RAM: 7ada - 7ada3fff
> (XEN) RAM: 7aea8000 - 7afa9fff
> (XEN) RAM: 7afaa000 - 7ec73fff
> (XEN) RAM: 7ec74000 - 7fdddfff
> (XEN) RAM: 7fdde000 - 7fea5fff
> (XEN) RAM: 7fea6000 - 7ff6dfff
> (XEN) RAM: 7000 - 7fff
>
> Looking at the differences with the Linux logs, there is indeed some
> memory not detected by Xen.
>
> On Xen, we only consider usuable memory any EFI description with
> EfiConventionalMemory, EfiBootServicesCode and EfiBootServicesData.
>
> Linux include more type here, so this may explain why we see a difference.
>
> While Looking at it, I have also noticed that we don't seem to care
> about the memory attribute. I suspect this could be another latent issue
> in Xen if the attribute does not match.

Anything I can do to help debug this? I can run any kind of debug builds, etc.
if needed.

I mean -- at this point it would be really great to get HiKey back to the status
of Xen-on-ARM developer board.

> > Any ideas here would be greatly apprecaited!
> >
> > Thanks,
> > Roman.
> >
> > P.S. Any guess at what these mean?
> >
> > (XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x008738
> > gva=0x872f2000 gpa=0x0f
> > (XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x00b734e558
> > gva=0xb72eb000 gpa=0x0f
> > (XEN) traps.c:1973:d0v0 HSR=0x93880006 pc=0x008f9d2558
> > gva=0x8f96f000 gpa=0x0f
>
> It means that Linux has tried to access something that has not been
> mapped in stage-2. As Dom0 is mapped 1:1, the GPA also give you the host
> physical address. 

Re: [Xen-devel] [PATCH v3 5/7] Add Code Review Guide

2019-12-18 Thread Lars Kurth


On 18/12/2019, 14:29, "Julien Grall"  wrote:

Hi Lars,

On 12/12/2019 21:14, Lars Kurth wrote:
> +### Workflow from an Author's Perspective
> +
> +When code authors receive feedback on their patches, they typically 
first try
> +to clarify feedback they do not understand. For smaller patches or patch 
series
> +it makes sense to wait until receiving feedback on the entire series 
before
> +sending out a new version addressing the changes. For larger series, it 
may
> +make sense to send out a new revision earlier.
> +
> +As a reviewer, you need some system that he;ps ensure that you address 
all

Just a small typo: I think you meant "helps" rather than "he;ps".

Cheers,

Thank you: fixed in my working copy.

One thing which occurred to me for reviews like these, where there is no ACK's 
or Reviewed-by's is that I don't actually know whether you as reviewer is 
otherwise happy with the remainder of patch.
Normally the ACKed-by or Reviewed-by is a signal that it is

I am assuming it is, but I think it may be worthwhile pointing this out in the 
document, that unless stated otherwise, the reviewer is happy with the patch

Regards
Lars 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] REGRESSION: Xen 4.13 RC5 fails to bootstrap Dom0 on ARM

2019-12-18 Thread Roman Shaposhnik
On Wed, Dec 18, 2019 at 3:50 AM Julien Grall  wrote:
>
> Hi,
>
> On 18/12/2019 07:36, Roman Shaposhnik wrote:
> > On Tue, Dec 17, 2019 at 6:56 PM Roman Shaposhnik  wrote:
> >> Exactly! That's the other surprising bit -- I noticed that too -- its not 
> >> like
> >> Xen doesn't see any of the memory above 1G -- it just doesn't see enough 
> >> of it.
> >>
> >> So the question is -- what is Linux doing that Xen doesn't?
> >
> > By the way, speaking of running Xen under ARM/qemu -- here's an interesting
> > observation: when I run qemu-system-aarch64 with -m 4096 option it seems
> > that, again, Linux kernel is perfectly content with having access to 4G of 
> > RAM,
> > while Xen only sees about 2G.
>
> Linux and Xen should see close to the same amount as memory as long as
> you are using the same bootloader...

Thanks for confirming. This is what I'm trying to get to on this
thread. Any help
would be greatly appreciated!

> > This may actually have something to do with UEFI I guess.
>
> ...  could you confirm whether you are booting Linux using UEFI or not?

The boot sequence in both cases is:
   HiKey l-loader
   HiKey Tianocore EDK2 – UEFI
   GRUB (as a UEFI payload)
   Xen | Linux

GRUB's commands for booting Xen + Dom0:
xen_hypervisor /boot/xen.efi console=dtuart   dom0_mem=640M
dom0_max_vcpus=1 dom0_vcpus_pin
xen_module /boot/kernel console=hvc0 root=(hd1,gpt1)/rootfs.img text
devicetree (hd1,gpt4)/eve.dtb
xen_module (hd1,gpt1)/initrd.img

GRUB's commands for booting Linux only:
linux /boot/kernel  console=ttyAMA0 console=ttyAMA1
console=ttyAMA2 console=ttyAMA3
root=PARTUUID=f71bd987-d99a-4c88-9781-cf4c26cae55e rootdelay=3
devicetree (hd1,gpt4)/eve.dtb

So -- nothing boots directly by UEFI -- everything goes through GRUB.

However, my understanding is that GRUB will detect devicetree
information provided by UEFI (even though devicetree command is
supposed to completely replace that). Hence it is possible that Linux
relies on some residuals left in memory by GRUB that Xen doesn't pay
attention to (but this is a pretty wild speculation only).

Thanks,
Roman.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [ANNOUNCEMENT] Xen 4.13 is released

2019-12-18 Thread Juergen Gross

Dear community members,

I'm pleased to announce that Xen 4.13.0 is released.

Please find the tarball and its signature at:

  https://downloads.xenproject.org/release/xen/4.13.0/

You can also check out the tag in xen.git:

  https://xenbits.xen.org/git-http/xen.git RELEASE-4.13.0

Git checkout and build instructions can be found at:

https://wiki.xenproject.org/wiki/Xen_Project_4.13_Release_Notes#Build_Requirements

Release notes can be found at:

  https://wiki.xenproject.org/wiki/Xen_Project_4.13_Release_Notes

A summary for 4.13 release documents can be found at:

  https://wiki.xenproject.org/wiki/Category:Xen_4.13

Technical blog post for 4.13 can be found at:

  https://xenproject.org/2019/12/18/whats-new-in-xen-4-13/

Thanks everyone who contributed to this release. This release would
not have happened without all the awesome contributions from around
the globe.

Regards,

Juergen Gross (on behalf of the Xen Project Hypervisor team)

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] clock source in PV Linux

2019-12-18 Thread Boris Ostrovsky



On 12/18/19 12:36 AM, Roman Shaposhnik wrote:

On Wed, Dec 11, 2019 at 12:41 AM Jan Beulich  wrote:

On 11.12.2019 09:16, Jürgen Groß wrote:

On 11.12.19 08:28, Jan Beulich wrote:

Jürgen, Boris,

I've noticed

<6>clocksource: Switched to clocksource tsc

as the final clocksource related boot message in a PV Dom0's
log with 5.4.2. Is it intentional that it's not the "xen" one
that gets used by default?

I think this is fine. I just tested it and I'm seeing the same in dom0,
while in a PV domU "xen" is used per default.

In dom0 "tsc" should be okay in case it is stable. Or are you expecting
problems with that setting?

Well, first of all I found this surprising. Whether there are problems to
be expected largely depends on the reliability of the "stable" detection
in PV Dom0.

Related question: does this mean that tsc is now default for PVH as well?

The reason I'm asking is because I'm still a bit worried about the
clock drift with tsc.



dom0 will use TSC for either PV or PVH:

xen_time_init():
   /* As Dom0 is never moved, no penalty on using TSC there */
    if (xen_initial_domain())
    xen_clocksource.rating = 275;

But as far as TSC stability I'd think it should be sufficiently checked 
by generic TSC init code?




-boris

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-18 Thread Paul Durrant
...for patches not (yet) upstream.

This patch is simply reserving save record number space to avoid the
risk of clashes between existent downstream changes made by Amazon and
future upstream changes which may be incompatible.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 
---
 xen/include/public/arch-x86/hvm/save.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/include/public/arch-x86/hvm/save.h 
b/xen/include/public/arch-x86/hvm/save.h
index b2ad3fcd74..9c7b86678e 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -639,10 +639,12 @@ struct hvm_msr {
 
 #define CPU_MSR_CODE  20
 
+/* Range 22 - 40 reserved for Amazon */
+
 /*
  * Largest type-code in use
  */
-#define HVM_SAVE_CODE_MAX 20
+#define HVM_SAVE_CODE_MAX 40
 
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
 
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable test] 144924: regressions - FAIL

2019-12-18 Thread osstest service owner
flight 144924 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144924/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-ovmf-amd64 13 guest-saverestore fail REGR. vs. 144905
 test-amd64-amd64-i386-pvgrub 17 guest-localmigrate/x10   fail REGR. vs. 144905

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 17 guest-saverestore.2  fail REGR. vs. 144905
 test-armhf-armhf-xl-rtds 12 guest-start  fail REGR. vs. 144905

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144905
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144905
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144905
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144905
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144905
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144905
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144905
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144905
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144905
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 xen  704fa1532801bc02c4500462f0b913b3c137db4d
baseline version:
 xen  f50a4f6e244cfc8e773300c03aaf4db391f3028a

Last test of basis   144905  2019-12-17 18:36:21 Z0 days
Testing same since   144924  2019-12-18 06:43:35 Z0 days1 attempts


People who touched revisions under test:
  

Re: [Xen-devel] [PATCH] [tools/hotplug] Use ip on systems where brctl is not available

2019-12-18 Thread Ian Jackson
Steven Haigh writes ("[PATCH] [tools/hotplug] Use ip on systems where brctl is 
not available"):
> Newer distros like CentOS 8 do not have brctl available. As such, we
> can't use it to configure networking anymore.
> 
> This patch will fall back to 'ip' or 'bridge' commands if brctl is not
> available in the working PATH.

This looks good to me at least in the brctl case.  I have two minor
comments.

For the avoidance of doubt, I guess you have tested this in the
`ip'/`bridge' case ?  How thoroughly ? :-)

> -if [ -z "$bridge" ]
> -then
> -  bridge=$(brctl show | awk 'NR==2{print$1}')
> -
> +if [ -z "$bridge" ]; then

The presumably-unintentional style change makes the review slightly
harder...

> -bridge=$(brctl show | cut -d "
> +if which brctl >&/dev/null; then

Maybe introduce
   have_brctl () { ... }
so we can say
   if have_brctl; then
?

Regards,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/python: Drop test.py

2019-12-18 Thread Lars Kurth


On 18/12/2019, 13:50, "Andrew Cooper"  wrote:

This file hasn't been touched since it was introduced in 2005 (c/s 
0c6f36628)
and has a wildly obsolete shebang for Python 2.3.  Most importantly for us 
is
that it isn't Python 3 compatible.

Drop the file entirely.  Since the 2.3 days, automatic discovery of tests 
has
been included in standard functionality.  Rewrite the test rule to use
"$(PYTHON) -m unittest discover" which is equivelent.

Dropping test.py drops the only piece of ZPL-2.0 code in the tree.  Drop the
ancillary files, and adjust COPYING to match.

Signed-off-by: Andrew Cooper 
---
CC: Ian Jackson 
CC: Wei Liu 
CC: Lars Kurth 

This wants backporting to 4.13 as soon as practical.

Reviewed-by: Lars Kurth (lars.ku...@citrix.com) - from a licensing perspective



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:43
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Jan Beulich
> ; Andrew Cooper ; Wei Liu
> ; Roger Pau Monné 
> Subject: [PATCH v2 6/6] x86: implement Hyper-V clock source
> 
> Implement a clock source using Hyper-V's reference TSC page.
> 
> Signed-off-by: Wei Liu 
> ---
> v2:
> 1. Address Jan's comments.
> 
> Relevant spec:
> 
> https://github.com/MicrosoftDocs/Virtualization-
> Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specif
> ication%20v5.0C.pdf
> 
> Section 12.6.
> ---
>  xen/arch/x86/time.c | 101 
>  1 file changed, 101 insertions(+)
> 
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 216169a025..8b96b2e9a5 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -644,6 +645,103 @@ static struct platform_timesource __initdata
> plt_xen_timer =
>  };
>  #endif
> 
> +#ifdef CONFIG_HYPERV_GUEST
> +/
> + * HYPER-V REFERENCE TSC
> + */
> +
> +static struct ms_hyperv_tsc_page *hyperv_tsc;
> +static struct page_info *hyperv_tsc_page;
> +
> +static int64_t __init init_hyperv_timer(struct platform_timesource *pts)
> +{
> +paddr_t maddr;
> +uint64_t tsc_msr, freq;
> +
> +if ( !(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE) )
> +return 0;
> +
> +hyperv_tsc_page = alloc_domheap_page(NULL, 0);
> +if ( !hyperv_tsc_page )
> +return 0;
> +
> +hyperv_tsc = __map_domain_page_global(hyperv_tsc_page);
> +if ( !hyperv_tsc )
> +{
> +free_domheap_page(hyperv_tsc_page);
> +hyperv_tsc_page = NULL;
> +return 0;
> +}
> +
> +maddr = page_to_maddr(hyperv_tsc_page);
> +
> +/*
> + * Per Hyper-V TLFS:
> + *   1. Read existing MSR value
> + *   2. Preserve bits [11:1]
> + *   3. Set bits [63:12] to be guest physical address of tsc page
> + *   4. Set enabled bit (0)
> + *   5. Write back new MSR value
> + */
> +rdmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr);
> +tsc_msr &= 0xffeULL;
> +tsc_msr |=  maddr | 1 /* enabled */;
> +wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr);
> +

You need to check for the HV_X64_ACCESS_FREQUENCY_MSRS feature or you risk a 
#GP below I think.

> +/* Get TSC frequency from Hyper-V */
> +rdmsrl(HV_X64_MSR_TSC_FREQUENCY, freq);
> +pts->frequency = freq;
> +
> +return freq;
> +}
> +
> +static inline uint64_t read_hyperv_timer(void)
> +{
> +uint64_t scale, offset, ret, tsc;
> +uint32_t seq;
> +const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc;
> +
> +do {
> +seq = tsc_page->tsc_sequence;
> +
> +/* Seq 0 is special. It means the TSC enlightenment is not
> + * available at the moment. The reference time can only be
> + * obtained from the Reference Counter MSR.
> + */
> +if ( seq == 0 )

Older versions of the spec used to use 0x I think, although when I look 
again they seem to have been retro-actively fixed. In any case I think you 
should treat both 0x and 0 as invalid.

> +{
> +rdmsrl(HV_X64_MSR_TIME_REF_COUNT, ret);
> +return ret;
> +}
> +
> +/* rdtsc_ordered already contains a load fence */
> +tsc = rdtsc_ordered();
> +scale = tsc_page->tsc_scale;
> +offset = tsc_page->tsc_offset;
> +
> +smp_rmb();
> +
> +} while (tsc_page->tsc_sequence != seq);
> +
> +/* ret = ((tsc * scale) >> 64) + offset; */
> +asm ( "mul %[scale]; add %[offset], %[ret]"
> +  : "+a" (tsc), [ret] "=d" (ret)
> +  : [scale] "rm" (scale), [offset] "rm" (offset) );
> +

It would be nice to common this up with scale_tsc() in viridian/time.c.

  Paul

> +return ret;
> +}
> +
> +static struct platform_timesource __initdata plt_hyperv_timer =
> +{
> +.id = "hyperv",
> +.name = "HYPER-V REFERENCE TSC",
> +.read_counter = read_hyperv_timer,
> +.init = init_hyperv_timer,
> +/* See TSC time source for why counter_bits is set to 63 */
> +.counter_bits = 63,
> +};
> +#endif
> +
>  /
>   * GENERIC PLATFORM TIMER INFRASTRUCTURE
>   */
> @@ -793,6 +891,9 @@ static u64 __init init_platform_timer(void)
>  static struct platform_timesource * __initdata plt_timers[] = {
>  #ifdef CONFIG_XEN_GUEST
>  _xen_timer,
> +#endif
> +#ifdef CONFIG_HYPERV_GUEST
> +_hyperv_timer,
>  #endif
>  _hpet, _pmtimer, _pit
>  };
> --
> 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock

2019-12-18 Thread Jürgen Groß

On 18.12.19 15:40, SeongJae Park wrote:

On Wed, 18 Dec 2019 14:30:44 +0100 "Jürgen Groß"  wrote:


On 18.12.19 13:42, SeongJae Park wrote:

On Wed, 18 Dec 2019 13:27:37 +0100 "Jürgen Groß"  wrote:


On 18.12.19 11:42, SeongJae Park wrote:

From: SeongJae Park 

'reclaim_memory' callback can race with a driver code as this callback
will be called from any memory pressure detected context.  To deal with
the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
'reclaim_memory' callback is called, the lock of the device which passed
to the callback as its argument is locked.  Thus, drivers registering
their 'reclaim_memory' callback should protect the data that might race
with the callback with the lock by themselves.


Any reason you don't take the lock around the .probe() and .remove()
calls of the backend (xenbus_dev_probe() and xenbus_dev_remove())? This
would eliminate the need to do that in each backend instead.


First of all, I would like to keep the critical section as small as possible.
With my small test, I could see slightly increasing memory pressure as the
critical section becomes wider.  Also, some drivers might share the data their
'reclaim_memory' callback touches with other functions.  I think only the
driver owners can know what data is shared and what is the minimum critical
section to protect it.


But this kind of serialization can still be added on top.


I'm still worrying about the unnecessarily large critical section, but it might
be small enough to be ignored.  If no others have strong objection, I will take
the lock around the '->probe()' and '->remove()'.


The lock is per device, so contention is possible only for the
reclaim case. In case probe or remove are running reclaim will have
nothing to free (in probe case nothing is allocated yet, in remove
case everything should be freed anyway). So the larger critical section
is no problem at all IMO.


And with the trylock in the reclaim path I believe you can even avoid
the irq variants of the spinlock. But I might be wrong, so you should
try that with lockdep enabled. If it is working there is no harm done
when making the critical section larger, as memory allocations will
work as before.


Yes, you're right.  I will try test with lockdep.


Thanks,


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] x86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:43
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH v2 4/6] x86/viridian: drop private copy of
> HV_REFERENCE_TSC_PAGE in time.c
> 
> Use the one defined in hyperv-tlfs.h instead. No functional change
> intended.
> 
> Signed-off-by: Wei Liu 
> ---
>  xen/arch/x86/hvm/viridian/time.c | 30 +++---
>  1 file changed, 11 insertions(+), 19 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/viridian/time.c
> b/xen/arch/x86/hvm/viridian/time.c
> index 6ddca29b29..33c15782e4 100644
> --- a/xen/arch/x86/hvm/viridian/time.c
> +++ b/xen/arch/x86/hvm/viridian/time.c
> @@ -13,19 +13,11 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "private.h"
> 
> -typedef struct _HV_REFERENCE_TSC_PAGE
> -{
> -uint32_t TscSequence;
> -uint32_t Reserved1;
> -uint64_t TscScale;
> -int64_t  TscOffset;
> -uint64_t Reserved2[509];
> -} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
> -
>  static void update_reference_tsc(const struct domain *d, bool initialize)
>  {
>  struct viridian_domain *vd = d->arch.hvm.viridian;
> @@ -41,18 +33,18 @@ static void update_reference_tsc(const struct domain
> *d, bool initialize)
>   * This enlightenment must be disabled is the host TSC is not
> invariant.
>   * However it is also disabled if vtsc is true (which means rdtsc is
>   * being emulated). This generally happens when guest TSC freq and
> host
> - * TSC freq don't match. The TscScale value could be adjusted to cope
> + * TSC freq don't match. The tsc_scale value could be adjusted to
> cope
>   * with this, allowing vtsc to be turned off, but support for this is
>   * not yet present in the hypervisor. Thus is it is possible that
>   * migrating a Windows VM between hosts of differing TSC frequencies
>   * may result in large differences in guest performance. Any jump in
>   * TSC due to migration down-time can, however, be compensated for by
> - * setting the TscOffset value (see below).
> + * setting the tsc_offset value (see below).
>   */
>  if ( !host_tsc_is_safe() || d->arch.vtsc )
>  {
>  /*
> - * The specification states that valid values of TscSequence
> range
> + * The specification states that valid values of tsc_sequence
> range
>   * from 0 to 0xFFFE. The value 0x is used to indicate
>   * this mechanism is no longer a reliable source of time and that
>   * the VM should fall back to a different source.
> @@ -61,7 +53,7 @@ static void update_reference_tsc(const struct domain *d,
> bool initialize)
>   * violate the spec. and rely on a value of 0 to indicate that
> this
>   * enlightenment should no longer be used.
>   */
> -p->TscSequence = 0;
> +p->tsc_sequence = 0;
> 
>  printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC:
> invalidated\n",
> d->domain_id);
> @@ -72,29 +64,29 @@ static void update_reference_tsc(const struct domain
> *d, bool initialize)
>   * The guest will calculate reference time according to the following
>   * formula:
>   *
> - * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
> + * ReferenceTime = ((RDTSC() * tsc_scale) >> 64) + tsc_offset
>   *
>   * Windows uses a 100ns tick, so we need a scale which is cpu
>   * ticks per 100ns shifted left by 64.
>   * The offset value is calculated on restore after migration and
>   * ensures that Windows will not see a large jump in ReferenceTime.
>   */
> -p->TscScale = ((1ul << 32) / d->arch.tsc_khz) << 32;
> -p->TscOffset = trc->off;
> +p->tsc_scale = ((1ul << 32) / d->arch.tsc_khz) << 32;
> +p->tsc_offset = trc->off;
>  smp_wmb();
> 
> -seq = p->TscSequence + 1;
> +seq = p->tsc_sequence + 1;
>  if ( seq == 0x || seq == 0 ) /* Avoid both 'invalid' values
> */
>  seq = 1;
> 
> -p->TscSequence = seq;
> +p->tsc_sequence = seq;
>  }
> 
>  /*
>   * The specification says: "The partition reference time is computed
>   * by the following formula:
>   *
> - * ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset
> + * ReferenceTime = ((VirtualTsc * tsc_scale) >> 64) + tsc_offset

I'd prefer keeping the CamelCase here as it's text lifted from the TLFS and not 
reliant on the header definitions.

  Paul

>   *
>   * The multiplication is a 64 bit multiplication, which results in a
>   * 128 bit number which is then shifted 64 times to the right to obtain
> --
> 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] tools/python: Python 3 compatibility

2019-12-18 Thread Andrew Cooper
convert-legacy-stream is only used for incomming migration from pre Xen 4.7,
and verify-stream-v2 appears to only be used by me during migration
development - it is little surprise that they missed the main converstion
effort in Xen 4.13.

Fix it all up.

Move open_file_or_fd() into a new util.py to avoid duplication, making it a
more generic wrapper around open() or fdopen().

Signed-off-by: Andrew Cooper 
---
CC: Ian Jackson 
CC: Wei Liu 

This needs backporting to 4.13 ASAP
---
 tools/python/scripts/convert-legacy-stream | 49 +++---
 tools/python/scripts/verify-stream-v2  | 43 +-
 tools/python/xen/migration/libxc.py|  2 +-
 tools/python/xen/migration/libxl.py|  2 +-
 tools/python/xen/migration/verify.py   |  4 +--
 tools/python/xen/util.py   | 23 ++
 6 files changed, 46 insertions(+), 77 deletions(-)
 create mode 100644 tools/python/xen/util.py

diff --git a/tools/python/scripts/convert-legacy-stream 
b/tools/python/scripts/convert-legacy-stream
index 5f80f13654..b0d81aa92e 100755
--- a/tools/python/scripts/convert-legacy-stream
+++ b/tools/python/scripts/convert-legacy-stream
@@ -5,6 +5,8 @@
 Convert a legacy migration stream to a v2 stream.
 """
 
+from __future__ import print_function
+
 import sys
 import os, os.path
 import syslog
@@ -12,6 +14,7 @@ import traceback
 
 from struct import calcsize, unpack, pack
 
+from xen.util import open_file_or_fd as open_file_or_fd
 from xen.migration import legacy, public, libxc, libxl, xl
 
 __version__ = 1
@@ -39,16 +42,16 @@ def info(msg):
 for line in msg.split("\n"):
 syslog.syslog(syslog.LOG_INFO, line)
 else:
-print msg
+print(msg)
 
 def err(msg):
 """Error message, routed to appropriate destination"""
 if log_to_syslog:
 for line in msg.split("\n"):
 syslog.syslog(syslog.LOG_ERR, line)
-print >> sys.stderr, msg
+print(msg, file = sys.stderr)
 
-class StreamError(StandardError):
+class StreamError(Exception):
 """Error with the incoming migration stream"""
 pass
 
@@ -70,7 +73,7 @@ class VM(object):
 
 # libxl
 self.libxl = fmt == "libxl"
-self.emu_xenstore = "" # NUL terminated key pairs from "toolstack" 
records
+self.emu_xenstore = b"" # NUL terminated key pairs from 
"toolstack" records
 
 def write_libxc_ihdr():
 stream_write(pack(libxc.IHDR_FORMAT,
@@ -336,7 +339,7 @@ def read_libxl_toolstack(vm, data):
 if twidth == 64:
 name = name[:-4]
 
-if name[-1] != '\x00':
+if name[-1] != b'\x00':
 raise StreamError("physmap name not NUL terminated")
 
 root = "physmap/%x" % (phys,)
@@ -347,7 +350,7 @@ def read_libxl_toolstack(vm, data):
 for key, val in zip(kv[0::2], kv[1::2]):
 info("'%s' = '%s'" % (key, val))
 
-vm.emu_xenstore += '\x00'.join(kv) + '\x00'
+vm.emu_xenstore += b'\x00'.join(kv) + b'\x00'
 
 
 def read_chunks(vm):
@@ -534,7 +537,7 @@ def read_qemu(vm):
 sig, = unpack("21s", rawsig)
 info("Qemu signature: %s" % (sig, ))
 
-if sig == "DeviceModelRecord0002":
+if sig == b"DeviceModelRecord0002":
 rawsz = rdexact(4)
 sz, = unpack("I", rawsz)
 qdata = rdexact(sz)
@@ -617,36 +620,6 @@ def read_legacy_stream(vm):
 return 2
 return 0
 
-def open_file_or_fd(val, mode):
-"""
-If 'val' looks like a decimal integer, open it as an fd.  If not, try to
-open it as a regular file.
-"""
-
-fd = -1
-try:
-# Does it look like an integer?
-try:
-fd = int(val, 10)
-except ValueError:
-pass
-
-# Try to open it...
-if fd != -1:
-return os.fdopen(fd, mode, 0)
-else:
-return open(val, mode, 0)
-
-except StandardError, e:
-if fd != -1:
-err("Unable to open fd %d: %s: %s" %
-(fd, e.__class__.__name__, e))
-else:
-err("Unable to open file '%s': %s: %s" %
-(val, e.__class__.__name__, e))
-
-raise SystemExit(1)
-
 
 def main():
 from optparse import OptionParser
@@ -723,7 +696,7 @@ def main():
 if __name__ == "__main__":
 try:
 sys.exit(main())
-except SystemExit, e:
+except SystemExit as e:
 sys.exit(e.code)
 except KeyboardInterrupt:
 sys.exit(1)
diff --git a/tools/python/scripts/verify-stream-v2 
b/tools/python/scripts/verify-stream-v2
index 3daf25791e..8355c2d206 100755
--- a/tools/python/scripts/verify-stream-v2
+++ b/tools/python/scripts/verify-stream-v2
@@ -3,12 +3,15 @@
 
 """ Verify a v2 format migration stream """
 
+from __future__ import print_function
+
 import sys
 import struct
 import os, os.path
 import syslog
 import traceback
 
+from xen.util import open_file_or_fd as open_file_or_fd
 from xen.migration.verify import StreamError, 

Re: [Xen-devel] [PATCH v2 3/6] x86/viridian: drop private copy of definitions from synic.c

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:43
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH v2 3/6] x86/viridian: drop private copy of definitions
> from synic.c
> 
> Use hyperv-tlfs.h instead. No functional change intended.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] x86: import hyperv-tlfs.h from Linux

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:42
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Jan Beulich
> ; Andrew Cooper ; Wei Liu
> ; Roger Pau Monné 
> Subject: [PATCH v2 1/6] x86: import hyperv-tlfs.h from Linux
> 
> Take a pristine copy from Linux commit
> b2d8b167e15bb5ec2691d1119c025630a247f649.
> 
> Do the following to fix it up for Xen:
> 
> 1. include xen/types.h and xen/bitops.h
> 2. fix up invocations of BIT macro
> 
> Signed-off-by: Wei Liu 
> Acked-by: Jan Beulich 
[snip]
> +/*
> + * The guest OS needs to register the guest ID with the hypervisor.
> + * The guest ID is a 64 bit entity and the structure of this ID is
> + * specified in the Hyper-V specification:
> + *
> + * msdn.microsoft.com/en-
> us/library/windows/hardware/ff542653%28v=vs.85%29.aspx
> + *
> + * While the current guideline does not specify how Linux guest ID(s)
> + * need to be generated, our plan is to publish the guidelines for
> + * Linux and other guest operating systems that currently are hosted
> + * on Hyper-V. The implementation here conforms to this yet
> + * unpublished guidelines.
> + *
> + *
> + * Bit(s)
> + * 63 - Indicates if the OS is Open Source or not; 1 is Open Source
> + * 62:56 - Os Type; Linux is 0x100
> + * 55:48 - Distro specific identification
> + * 47:16 - Linux kernel version number
> + * 15:0  - Distro specific identification
> + *
> + *

It might be useful to pull the declaration of union viridian_guest_os_id_msr in 
here since the comment is explaining the format.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/6] x86/viridian: drop duplicate defines from private.h and viridian.c

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:42
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH v2 2/6] x86/viridian: drop duplicate defines from
> private.h and viridian.c
> 
> No functional change intended.
> 
> Signed-off-by: Wei Liu 

[snip]
> diff --git a/xen/arch/x86/hvm/viridian/viridian.c
> b/xen/arch/x86/hvm/viridian/viridian.c
> index 4b06b78a27..76f6b6510b 100644
> --- a/xen/arch/x86/hvm/viridian/viridian.c
> +++ b/xen/arch/x86/hvm/viridian/viridian.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -19,22 +20,10 @@
> 
>  #include "private.h"
> 
> -/* Viridian Hypercall Status Codes. */
> -#define HV_STATUS_SUCCESS   0x
> -#define HV_STATUS_INVALID_HYPERCALL_CODE0x0002
> -#define HV_STATUS_INVALID_PARAMETER 0x0005
> -
>  /* Viridian Hypercall Codes. */
> -#define HvFlushVirtualAddressSpace 0x0002
> -#define HvFlushVirtualAddressList  0x0003
> -#define HvNotifyLongSpinWait   0x0008
> -#define HvSendSyntheticClusterIpi  0x000b

>  #define HvGetPartitionId   0x0046
>  #define HvExtCallQueryCapabilities 0x8001

These ought to be added to hyperv-tlfs.h. After all they are specified in the 
TLFS.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/python: Drop test.py

2019-12-18 Thread Wei Liu
On Wed, Dec 18, 2019 at 01:50:06PM +, Andrew Cooper wrote:
> This file hasn't been touched since it was introduced in 2005 (c/s 0c6f36628)
> and has a wildly obsolete shebang for Python 2.3.  Most importantly for us is
> that it isn't Python 3 compatible.
> 
> Drop the file entirely.  Since the 2.3 days, automatic discovery of tests has
> been included in standard functionality.  Rewrite the test rule to use
> "$(PYTHON) -m unittest discover" which is equivelent.
> 
> Dropping test.py drops the only piece of ZPL-2.0 code in the tree.  Drop the
> ancillary files, and adjust COPYING to match.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 3/6] x86/viridian: drop private copy of definitions from synic.c

2019-12-18 Thread Wei Liu
Use hyperv-tlfs.h instead. No functional change intended.

Signed-off-by: Wei Liu 
---
 xen/arch/x86/hvm/viridian/synic.c | 68 ---
 1 file changed, 16 insertions(+), 52 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian/synic.c 
b/xen/arch/x86/hvm/viridian/synic.c
index 2791021bcc..54c62f843f 100644
--- a/xen/arch/x86/hvm/viridian/synic.c
+++ b/xen/arch/x86/hvm/viridian/synic.c
@@ -12,58 +12,22 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
 #include "private.h"
 
-typedef struct _HV_VIRTUAL_APIC_ASSIST
-{
-uint32_t no_eoi:1;
-uint32_t reserved_zero:31;
-} HV_VIRTUAL_APIC_ASSIST;
-
-typedef union _HV_VP_ASSIST_PAGE
-{
-HV_VIRTUAL_APIC_ASSIST ApicAssist;
-uint8_t ReservedZBytePadding[PAGE_SIZE];
-} HV_VP_ASSIST_PAGE;
-
-typedef enum HV_MESSAGE_TYPE {
-HvMessageTypeNone,
-HvMessageTimerExpired = 0x8010,
-} HV_MESSAGE_TYPE;
-
-typedef struct HV_MESSAGE_FLAGS {
-uint8_t MessagePending:1;
-uint8_t Reserved:7;
-} HV_MESSAGE_FLAGS;
-
-typedef struct HV_MESSAGE_HEADER {
-HV_MESSAGE_TYPE MessageType;
-uint16_t Reserved1;
-HV_MESSAGE_FLAGS MessageFlags;
-uint8_t PayloadSize;
-uint64_t Reserved2;
-} HV_MESSAGE_HEADER;
-
-#define HV_MESSAGE_SIZE 256
-#define HV_MESSAGE_MAX_PAYLOAD_QWORD_COUNT 30
-
-typedef struct HV_MESSAGE {
-HV_MESSAGE_HEADER Header;
-uint64_t Payload[HV_MESSAGE_MAX_PAYLOAD_QWORD_COUNT];
-} HV_MESSAGE;
 
 void __init __maybe_unused build_assertions(void)
 {
-BUILD_BUG_ON(sizeof(HV_MESSAGE) != HV_MESSAGE_SIZE);
+BUILD_BUG_ON(sizeof(struct hv_message) != HV_MESSAGE_SIZE);
 }
 
 void viridian_apic_assist_set(const struct vcpu *v)
 {
 struct viridian_vcpu *vv = v->arch.hvm.viridian;
-HV_VP_ASSIST_PAGE *ptr = vv->vp_assist.ptr;
+struct hv_vp_assist_page *ptr = vv->vp_assist.ptr;
 
 if ( !ptr )
 return;
@@ -77,18 +41,18 @@ void viridian_apic_assist_set(const struct vcpu *v)
 domain_crash(v->domain);
 
 vv->apic_assist_pending = true;
-ptr->ApicAssist.no_eoi = 1;
+ptr->apic_assist = 1;
 }
 
 bool viridian_apic_assist_completed(const struct vcpu *v)
 {
 struct viridian_vcpu *vv = v->arch.hvm.viridian;
-HV_VP_ASSIST_PAGE *ptr = vv->vp_assist.ptr;
+struct hv_vp_assist_page *ptr = vv->vp_assist.ptr;
 
 if ( !ptr )
 return false;
 
-if ( vv->apic_assist_pending && !ptr->ApicAssist.no_eoi )
+if ( vv->apic_assist_pending && !ptr->apic_assist )
 {
 /* An EOI has been avoided */
 vv->apic_assist_pending = false;
@@ -101,12 +65,12 @@ bool viridian_apic_assist_completed(const struct vcpu *v)
 void viridian_apic_assist_clear(const struct vcpu *v)
 {
 struct viridian_vcpu *vv = v->arch.hvm.viridian;
-HV_VP_ASSIST_PAGE *ptr = vv->vp_assist.ptr;
+struct hv_vp_assist_page *ptr = vv->vp_assist.ptr;
 
 if ( !ptr )
 return;
 
-ptr->ApicAssist.no_eoi = 0;
+ptr->apic_assist = 0;
 vv->apic_assist_pending = false;
 }
 
@@ -358,7 +322,7 @@ bool viridian_synic_deliver_timer_msg(struct vcpu *v, 
unsigned int sintx,
 {
 struct viridian_vcpu *vv = v->arch.hvm.viridian;
 const union viridian_sint_msr *vs = >sint[sintx];
-HV_MESSAGE *msg = vv->simp.ptr;
+struct hv_message *msg = vv->simp.ptr;
 struct {
 uint32_t TimerIndex;
 uint32_t Reserved;
@@ -382,19 +346,19 @@ bool viridian_synic_deliver_timer_msg(struct vcpu *v, 
unsigned int sintx,
 
 msg += sintx;
 
-if ( msg->Header.MessageType != HvMessageTypeNone )
+if ( msg->header.message_type != HVMSG_NONE )
 {
-msg->Header.MessageFlags.MessagePending = 1;
+msg->header.message_flags.msg_pending = 1;
 __set_bit(sintx, >msg_pending);
 return false;
 }
 
-msg->Header.MessageType = HvMessageTimerExpired;
-msg->Header.MessageFlags.MessagePending = 0;
-msg->Header.PayloadSize = sizeof(payload);
+msg->header.message_type = HVMSG_TIMER_EXPIRED;
+msg->header.message_flags.msg_pending = 0;
+msg->header.payload_size = sizeof(payload);
 
-BUILD_BUG_ON(sizeof(payload) > sizeof(msg->Payload));
-memcpy(msg->Payload, , sizeof(payload));
+BUILD_BUG_ON(sizeof(payload) > sizeof(msg->u.payload));
+memcpy(msg->u.payload, , sizeof(payload));
 
 if ( !vs->mask )
 vlapic_set_irq(vcpu_vlapic(v), vs->vector, 0);
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source

2019-12-18 Thread Wei Liu
Implement a clock source using Hyper-V's reference TSC page.

Signed-off-by: Wei Liu 
---
v2:
1. Address Jan's comments.

Relevant spec:

https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Section 12.6.
---
 xen/arch/x86/time.c | 101 
 1 file changed, 101 insertions(+)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 216169a025..8b96b2e9a5 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -644,6 +645,103 @@ static struct platform_timesource __initdata 
plt_xen_timer =
 };
 #endif
 
+#ifdef CONFIG_HYPERV_GUEST
+/
+ * HYPER-V REFERENCE TSC
+ */
+
+static struct ms_hyperv_tsc_page *hyperv_tsc;
+static struct page_info *hyperv_tsc_page;
+
+static int64_t __init init_hyperv_timer(struct platform_timesource *pts)
+{
+paddr_t maddr;
+uint64_t tsc_msr, freq;
+
+if ( !(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE) )
+return 0;
+
+hyperv_tsc_page = alloc_domheap_page(NULL, 0);
+if ( !hyperv_tsc_page )
+return 0;
+
+hyperv_tsc = __map_domain_page_global(hyperv_tsc_page);
+if ( !hyperv_tsc )
+{
+free_domheap_page(hyperv_tsc_page);
+hyperv_tsc_page = NULL;
+return 0;
+}
+
+maddr = page_to_maddr(hyperv_tsc_page);
+
+/*
+ * Per Hyper-V TLFS:
+ *   1. Read existing MSR value
+ *   2. Preserve bits [11:1]
+ *   3. Set bits [63:12] to be guest physical address of tsc page
+ *   4. Set enabled bit (0)
+ *   5. Write back new MSR value
+ */
+rdmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr);
+tsc_msr &= 0xffeULL;
+tsc_msr |=  maddr | 1 /* enabled */;
+wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr);
+
+/* Get TSC frequency from Hyper-V */
+rdmsrl(HV_X64_MSR_TSC_FREQUENCY, freq);
+pts->frequency = freq;
+
+return freq;
+}
+
+static inline uint64_t read_hyperv_timer(void)
+{
+uint64_t scale, offset, ret, tsc;
+uint32_t seq;
+const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc;
+
+do {
+seq = tsc_page->tsc_sequence;
+
+/* Seq 0 is special. It means the TSC enlightenment is not
+ * available at the moment. The reference time can only be
+ * obtained from the Reference Counter MSR.
+ */
+if ( seq == 0 )
+{
+rdmsrl(HV_X64_MSR_TIME_REF_COUNT, ret);
+return ret;
+}
+
+/* rdtsc_ordered already contains a load fence */
+tsc = rdtsc_ordered();
+scale = tsc_page->tsc_scale;
+offset = tsc_page->tsc_offset;
+
+smp_rmb();
+
+} while (tsc_page->tsc_sequence != seq);
+
+/* ret = ((tsc * scale) >> 64) + offset; */
+asm ( "mul %[scale]; add %[offset], %[ret]"
+  : "+a" (tsc), [ret] "=d" (ret)
+  : [scale] "rm" (scale), [offset] "rm" (offset) );
+
+return ret;
+}
+
+static struct platform_timesource __initdata plt_hyperv_timer =
+{
+.id = "hyperv",
+.name = "HYPER-V REFERENCE TSC",
+.read_counter = read_hyperv_timer,
+.init = init_hyperv_timer,
+/* See TSC time source for why counter_bits is set to 63 */
+.counter_bits = 63,
+};
+#endif
+
 /
  * GENERIC PLATFORM TIMER INFRASTRUCTURE
  */
@@ -793,6 +891,9 @@ static u64 __init init_platform_timer(void)
 static struct platform_timesource * __initdata plt_timers[] = {
 #ifdef CONFIG_XEN_GUEST
 _xen_timer,
+#endif
+#ifdef CONFIG_HYPERV_GUEST
+_hyperv_timer,
 #endif
 _hpet, _pmtimer, _pit
 };
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 4/6] x86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c

2019-12-18 Thread Wei Liu
Use the one defined in hyperv-tlfs.h instead. No functional change
intended.

Signed-off-by: Wei Liu 
---
 xen/arch/x86/hvm/viridian/time.c | 30 +++---
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian/time.c b/xen/arch/x86/hvm/viridian/time.c
index 6ddca29b29..33c15782e4 100644
--- a/xen/arch/x86/hvm/viridian/time.c
+++ b/xen/arch/x86/hvm/viridian/time.c
@@ -13,19 +13,11 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include "private.h"
 
-typedef struct _HV_REFERENCE_TSC_PAGE
-{
-uint32_t TscSequence;
-uint32_t Reserved1;
-uint64_t TscScale;
-int64_t  TscOffset;
-uint64_t Reserved2[509];
-} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
-
 static void update_reference_tsc(const struct domain *d, bool initialize)
 {
 struct viridian_domain *vd = d->arch.hvm.viridian;
@@ -41,18 +33,18 @@ static void update_reference_tsc(const struct domain *d, 
bool initialize)
  * This enlightenment must be disabled is the host TSC is not invariant.
  * However it is also disabled if vtsc is true (which means rdtsc is
  * being emulated). This generally happens when guest TSC freq and host
- * TSC freq don't match. The TscScale value could be adjusted to cope
+ * TSC freq don't match. The tsc_scale value could be adjusted to cope
  * with this, allowing vtsc to be turned off, but support for this is
  * not yet present in the hypervisor. Thus is it is possible that
  * migrating a Windows VM between hosts of differing TSC frequencies
  * may result in large differences in guest performance. Any jump in
  * TSC due to migration down-time can, however, be compensated for by
- * setting the TscOffset value (see below).
+ * setting the tsc_offset value (see below).
  */
 if ( !host_tsc_is_safe() || d->arch.vtsc )
 {
 /*
- * The specification states that valid values of TscSequence range
+ * The specification states that valid values of tsc_sequence range
  * from 0 to 0xFFFE. The value 0x is used to indicate
  * this mechanism is no longer a reliable source of time and that
  * the VM should fall back to a different source.
@@ -61,7 +53,7 @@ static void update_reference_tsc(const struct domain *d, bool 
initialize)
  * violate the spec. and rely on a value of 0 to indicate that this
  * enlightenment should no longer be used.
  */
-p->TscSequence = 0;
+p->tsc_sequence = 0;
 
 printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: invalidated\n",
d->domain_id);
@@ -72,29 +64,29 @@ static void update_reference_tsc(const struct domain *d, 
bool initialize)
  * The guest will calculate reference time according to the following
  * formula:
  *
- * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
+ * ReferenceTime = ((RDTSC() * tsc_scale) >> 64) + tsc_offset
  *
  * Windows uses a 100ns tick, so we need a scale which is cpu
  * ticks per 100ns shifted left by 64.
  * The offset value is calculated on restore after migration and
  * ensures that Windows will not see a large jump in ReferenceTime.
  */
-p->TscScale = ((1ul << 32) / d->arch.tsc_khz) << 32;
-p->TscOffset = trc->off;
+p->tsc_scale = ((1ul << 32) / d->arch.tsc_khz) << 32;
+p->tsc_offset = trc->off;
 smp_wmb();
 
-seq = p->TscSequence + 1;
+seq = p->tsc_sequence + 1;
 if ( seq == 0x || seq == 0 ) /* Avoid both 'invalid' values */
 seq = 1;
 
-p->TscSequence = seq;
+p->tsc_sequence = seq;
 }
 
 /*
  * The specification says: "The partition reference time is computed
  * by the following formula:
  *
- * ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset
+ * ReferenceTime = ((VirtualTsc * tsc_scale) >> 64) + tsc_offset
  *
  * The multiplication is a 64 bit multiplication, which results in a
  * 128 bit number which is then shifted 64 times to the right to obtain
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 0/6] Implement Hyper-V reference TSC based clock source

2019-12-18 Thread Wei Liu
Hi all

This series adds a clock source based on Hyper-V's reference TSC. The
meat is in the last patch. I also put in some clean up patches to Xen's
viridian code per Paul's request.

With this series, Xen on Hyper-V no longer runs on emulated PIT.

(XEN) Platform timer is 2294.686MHz HYPER-V REFERENCE TSC

Wei.

Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monné 
Cc: Paul Durrant 

Wei Liu (6):
  x86: import hyperv-tlfs.h from Linux
  x86/viridian: drop duplicate defines from private.h and viridian.c
  x86/viridian: drop private copy of definitions from synic.c
  x86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c
  x86/hyperv: extract more information from Hyper-V
  x86: implement Hyper-V clock source

 xen/arch/x86/guest/hyperv/hyperv.c  |  17 +
 xen/arch/x86/hvm/viridian/private.h |  66 --
 xen/arch/x86/hvm/viridian/synic.c   |  68 +-
 xen/arch/x86/hvm/viridian/time.c|  30 +-
 xen/arch/x86/hvm/viridian/viridian.c|  23 +-
 xen/arch/x86/time.c | 101 +++
 xen/include/asm-x86/guest/hyperv-tlfs.h | 907 
 xen/include/asm-x86/guest/hyperv.h  |  12 +
 8 files changed, 1070 insertions(+), 154 deletions(-)
 create mode 100644 xen/include/asm-x86/guest/hyperv-tlfs.h

--
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/6] x86: import hyperv-tlfs.h from Linux

2019-12-18 Thread Wei Liu
Take a pristine copy from Linux commit b2d8b167e15bb5ec2691d1119c025630a247f649.

Do the following to fix it up for Xen:

1. include xen/types.h and xen/bitops.h
2. fix up invocations of BIT macro

Signed-off-by: Wei Liu 
Acked-by: Jan Beulich 
---
 xen/include/asm-x86/guest/hyperv-tlfs.h | 907 
 1 file changed, 907 insertions(+)
 create mode 100644 xen/include/asm-x86/guest/hyperv-tlfs.h

diff --git a/xen/include/asm-x86/guest/hyperv-tlfs.h 
b/xen/include/asm-x86/guest/hyperv-tlfs.h
new file mode 100644
index 00..ccd9850b27
--- /dev/null
+++ b/xen/include/asm-x86/guest/hyperv-tlfs.h
@@ -0,0 +1,907 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * This file contains definitions from Hyper-V Hypervisor Top-Level Functional
+ * Specification (TLFS):
+ * 
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
+ */
+
+#ifndef _ASM_X86_HYPERV_TLFS_H
+#define _ASM_X86_HYPERV_TLFS_H
+
+#include 
+#include 
+#include 
+
+/*
+ * While not explicitly listed in the TLFS, Hyper-V always runs with a page 
size
+ * of 4096. These definitions are used when communicating with Hyper-V using
+ * guest physical pages and guest physical page addresses, since the guest page
+ * size may not be 4096 on all architectures.
+ */
+#define HV_HYP_PAGE_SHIFT  12
+#define HV_HYP_PAGE_SIZE   BIT(HV_HYP_PAGE_SHIFT, UL)
+#define HV_HYP_PAGE_MASK   (~(HV_HYP_PAGE_SIZE - 1))
+
+/*
+ * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
+ * is set by CPUID(HvCpuIdFunctionVersionAndFeatures).
+ */
+#define HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS  0x4000
+#define HYPERV_CPUID_INTERFACE 0x4001
+#define HYPERV_CPUID_VERSION   0x4002
+#define HYPERV_CPUID_FEATURES  0x4003
+#define HYPERV_CPUID_ENLIGHTMENT_INFO  0x4004
+#define HYPERV_CPUID_IMPLEMENT_LIMITS  0x4005
+#define HYPERV_CPUID_NESTED_FEATURES   0x400A
+
+#define HYPERV_HYPERVISOR_PRESENT_BIT  0x8000
+#define HYPERV_CPUID_MIN   0x4005
+#define HYPERV_CPUID_MAX   0x4000
+
+/*
+ * Feature identification. EAX indicates which features are available
+ * to the partition based upon the current partition privileges.
+ * These are HYPERV_CPUID_FEATURES.EAX bits.
+ */
+
+/* VP Runtime (HV_X64_MSR_VP_RUNTIME) available */
+#define HV_X64_MSR_VP_RUNTIME_AVAILABLEBIT(0, UL)
+/* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
+#define HV_MSR_TIME_REF_COUNT_AVAILABLEBIT(1, UL)
+/*
+ * Basic SynIC MSRs (HV_X64_MSR_SCONTROL through HV_X64_MSR_EOM
+ * and HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15) available
+ */
+#define HV_X64_MSR_SYNIC_AVAILABLE BIT(2, UL)
+/*
+ * Synthetic Timer MSRs (HV_X64_MSR_STIMER0_CONFIG through
+ * HV_X64_MSR_STIMER3_COUNT) available
+ */
+#define HV_MSR_SYNTIMER_AVAILABLE  BIT(3, UL)
+/*
+ * APIC access MSRs (HV_X64_MSR_EOI, HV_X64_MSR_ICR and HV_X64_MSR_TPR)
+ * are available
+ */
+#define HV_X64_MSR_APIC_ACCESS_AVAILABLE   BIT(4, UL)
+/* Hypercall MSRs (HV_X64_MSR_GUEST_OS_ID and HV_X64_MSR_HYPERCALL) available*/
+#define HV_X64_MSR_HYPERCALL_AVAILABLE BIT(5, UL)
+/* Access virtual processor index MSR (HV_X64_MSR_VP_INDEX) available*/
+#define HV_X64_MSR_VP_INDEX_AVAILABLE  BIT(6, UL)
+/* Virtual system reset MSR (HV_X64_MSR_RESET) is available*/
+#define HV_X64_MSR_RESET_AVAILABLE BIT(7, UL)
+/*
+ * Access statistics pages MSRs (HV_X64_MSR_STATS_PARTITION_RETAIL_PAGE,
+ * HV_X64_MSR_STATS_PARTITION_INTERNAL_PAGE, HV_X64_MSR_STATS_VP_RETAIL_PAGE,
+ * HV_X64_MSR_STATS_VP_INTERNAL_PAGE) available
+ */
+#define HV_X64_MSR_STAT_PAGES_AVAILABLEBIT(8, UL)
+/* Partition reference TSC MSR is available */
+#define HV_MSR_REFERENCE_TSC_AVAILABLE BIT(9, UL)
+/* Partition Guest IDLE MSR is available */
+#define HV_X64_MSR_GUEST_IDLE_AVAILABLEBIT(10, UL)
+/*
+ * There is a single feature flag that signifies if the partition has access
+ * to MSRs with local APIC and TSC frequencies.
+ */
+#define HV_X64_ACCESS_FREQUENCY_MSRS   BIT(11, UL)
+/* AccessReenlightenmentControls privilege */
+#define HV_X64_ACCESS_REENLIGHTENMENT  BIT(13, UL)
+
+/*
+ * Feature identification: indicates which flags were specified at partition
+ * creation. The format is the same as the partition creation flag structure
+ * defined in section Partition Creation Flags.
+ * These are HYPERV_CPUID_FEATURES.EBX bits.
+ */
+#define HV_X64_CREATE_PARTITIONS   BIT(0, UL)
+#define HV_X64_ACCESS_PARTITION_ID BIT(1, UL)
+#define HV_X64_ACCESS_MEMORY_POOL  BIT(2, UL)
+#define HV_X64_ADJUST_MESSAGE_BUFFERS  BIT(3, UL)
+#define HV_X64_POST_MESSAGES   BIT(4, UL)
+#define HV_X64_SIGNAL_EVENTS   BIT(5, UL)
+#define HV_X64_CREATE_PORT 

[Xen-devel] [PATCH v2 2/6] x86/viridian: drop duplicate defines from private.h and viridian.c

2019-12-18 Thread Wei Liu
No functional change intended.

Signed-off-by: Wei Liu 
---
 xen/arch/x86/hvm/viridian/private.h  | 66 
 xen/arch/x86/hvm/viridian/viridian.c | 23 +++---
 2 files changed, 6 insertions(+), 83 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian/private.h 
b/xen/arch/x86/hvm/viridian/private.h
index c272c34cda..958a2814c2 100644
--- a/xen/arch/x86/hvm/viridian/private.h
+++ b/xen/arch/x86/hvm/viridian/private.h
@@ -5,72 +5,6 @@
 
 #include 
 
-/* Viridian MSR numbers. */
-#define HV_X64_MSR_GUEST_OS_ID   0x4000
-#define HV_X64_MSR_HYPERCALL 0x4001
-#define HV_X64_MSR_VP_INDEX  0x4002
-#define HV_X64_MSR_RESET 0x4003
-#define HV_X64_MSR_VP_RUNTIME0x4010
-#define HV_X64_MSR_TIME_REF_COUNT0x4020
-#define HV_X64_MSR_REFERENCE_TSC 0x4021
-#define HV_X64_MSR_TSC_FREQUENCY 0x4022
-#define HV_X64_MSR_APIC_FREQUENCY0x4023
-#define HV_X64_MSR_EOI   0x4070
-#define HV_X64_MSR_ICR   0x4071
-#define HV_X64_MSR_TPR   0x4072
-#define HV_X64_MSR_VP_ASSIST_PAGE0x4073
-#define HV_X64_MSR_SCONTROL  0x4080
-#define HV_X64_MSR_SVERSION  0x4081
-#define HV_X64_MSR_SIEFP 0x4082
-#define HV_X64_MSR_SIMP  0x4083
-#define HV_X64_MSR_EOM   0x4084
-#define HV_X64_MSR_SINT0 0x4090
-#define HV_X64_MSR_SINT1 0x4091
-#define HV_X64_MSR_SINT2 0x4092
-#define HV_X64_MSR_SINT3 0x4093
-#define HV_X64_MSR_SINT4 0x4094
-#define HV_X64_MSR_SINT5 0x4095
-#define HV_X64_MSR_SINT6 0x4096
-#define HV_X64_MSR_SINT7 0x4097
-#define HV_X64_MSR_SINT8 0x4098
-#define HV_X64_MSR_SINT9 0x4099
-#define HV_X64_MSR_SINT100x409A
-#define HV_X64_MSR_SINT110x409B
-#define HV_X64_MSR_SINT120x409C
-#define HV_X64_MSR_SINT130x409D
-#define HV_X64_MSR_SINT140x409E
-#define HV_X64_MSR_SINT150x409F
-#define HV_X64_MSR_STIMER0_CONFIG0x40B0
-#define HV_X64_MSR_STIMER0_COUNT 0x40B1
-#define HV_X64_MSR_STIMER1_CONFIG0x40B2
-#define HV_X64_MSR_STIMER1_COUNT 0x40B3
-#define HV_X64_MSR_STIMER2_CONFIG0x40B4
-#define HV_X64_MSR_STIMER2_COUNT 0x40B5
-#define HV_X64_MSR_STIMER3_CONFIG0x40B6
-#define HV_X64_MSR_STIMER3_COUNT 0x40B7
-#define HV_X64_MSR_POWER_STATE_TRIGGER_C10x40C1
-#define HV_X64_MSR_POWER_STATE_TRIGGER_C20x40C2
-#define HV_X64_MSR_POWER_STATE_TRIGGER_C30x40C3
-#define HV_X64_MSR_POWER_STATE_CONFIG_C1 0x40D1
-#define HV_X64_MSR_POWER_STATE_CONFIG_C2 0x40D2
-#define HV_X64_MSR_POWER_STATE_CONFIG_C3 0x40D3
-#define HV_X64_MSR_STATS_PARTITION_RETAIL_PAGE   0x40E0
-#define HV_X64_MSR_STATS_PARTITION_INTERNAL_PAGE 0x40E1
-#define HV_X64_MSR_STATS_VP_RETAIL_PAGE  0x40E2
-#define HV_X64_MSR_STATS_VP_INTERNAL_PAGE0x40E3
-#define HV_X64_MSR_GUEST_IDLE0x40F0
-#define HV_X64_MSR_SYNTH_DEBUG_CONTROL   0x40F1
-#define HV_X64_MSR_SYNTH_DEBUG_STATUS0x40F2
-#define HV_X64_MSR_SYNTH_DEBUG_SEND_BUFFER   0x40F3
-#define HV_X64_MSR_SYNTH_DEBUG_RECEIVE_BUFFER0x40F4
-#define HV_X64_MSR_SYNTH_DEBUG_PENDING_BUFFER0x40F5
-#define HV_X64_MSR_CRASH_P0  0x4100
-#define HV_X64_MSR_CRASH_P1  0x4101
-#define HV_X64_MSR_CRASH_P2  0x4102
-#define HV_X64_MSR_CRASH_P3  0x4103
-#define HV_X64_MSR_CRASH_P4  0x4104
-#define HV_X64_MSR_CRASH_CTL 0x4105
-
 int viridian_synic_wrmsr(struct vcpu *v, uint32_t idx, uint64_t val);
 int viridian_synic_rdmsr(const struct vcpu *v, uint32_t idx, uint64_t *val);
 
diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index 4b06b78a27..76f6b6510b 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -19,22 +20,10 @@
 
 #include "private.h"
 
-/* Viridian Hypercall Status Codes. */
-#define HV_STATUS_SUCCESS

[Xen-devel] [PATCH v2 5/6] x86/hyperv: extract more information from Hyper-V

2019-12-18 Thread Wei Liu
Provide a structure to store that information. The structure will be
accessed from other places later so make it public.

Signed-off-by: Wei Liu 
Acked-by: Jan Beulich 
---
 xen/arch/x86/guest/hyperv/hyperv.c | 17 +
 xen/include/asm-x86/guest/hyperv.h | 12 
 2 files changed, 29 insertions(+)

diff --git a/xen/arch/x86/guest/hyperv/hyperv.c 
b/xen/arch/x86/guest/hyperv/hyperv.c
index b82ae3833f..2e70b4aa82 100644
--- a/xen/arch/x86/guest/hyperv/hyperv.c
+++ b/xen/arch/x86/guest/hyperv/hyperv.c
@@ -21,6 +21,9 @@
 #include 
 
 #include 
+#include 
+
+struct ms_hyperv_info __read_mostly ms_hyperv;
 
 static const struct hypervisor_ops ops = {
 .name = "Hyper-V",
@@ -40,6 +43,20 @@ const struct hypervisor_ops *__init hyperv_probe(void)
 if ( eax != 0x31237648 )/* Hv#1 */
 return NULL;
 
+/* Extract more information from Hyper-V */
+cpuid(HYPERV_CPUID_FEATURES, , , , );
+ms_hyperv.features = eax;
+ms_hyperv.misc_features = edx;
+
+ms_hyperv.hints = cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO);
+
+if ( ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED )
+ms_hyperv.nested_features = cpuid_eax(HYPERV_CPUID_NESTED_FEATURES);
+
+cpuid(HYPERV_CPUID_IMPLEMENT_LIMITS, , , , );
+ms_hyperv.max_vp_index = eax;
+ms_hyperv.max_lp_index = ebx;
+
 return 
 }
 
diff --git a/xen/include/asm-x86/guest/hyperv.h 
b/xen/include/asm-x86/guest/hyperv.h
index 3f88b94c77..cc21b9abfc 100644
--- a/xen/include/asm-x86/guest/hyperv.h
+++ b/xen/include/asm-x86/guest/hyperv.h
@@ -21,8 +21,20 @@
 
 #ifdef CONFIG_HYPERV_GUEST
 
+#include 
+
 #include 
 
+struct ms_hyperv_info {
+uint32_t features;
+uint32_t misc_features;
+uint32_t hints;
+uint32_t nested_features;
+uint32_t max_vp_index;
+uint32_t max_lp_index;
+};
+extern struct ms_hyperv_info ms_hyperv;
+
 const struct hypervisor_ops *hyperv_probe(void);
 
 #else
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/hvm/rtc: preserved guest RTC offset during suspend/resume/migrate

2019-12-18 Thread Paul Durrant
The emulated RTC is synchronized with the PV wallclock; any write to the
RTC will update struct domain's 'time_offset_seconds' field and call
update_domain_wallclock().

However, the value of 'time_offset_seconds' is not preserved in any save
record and indeed, when the RTC save record is loaded, the CMOS values
will be updated based on an offset value which may or may not have been
set by the toolstack [1]. This may result in making bogus values available
to the guest and messing up any calculations done in the call to
alarm_timer_update() at the end of rtc_load().

This patch extends the RTC save record to contain an offset value, which
will be zero filled on load of an older record. The 'time_offset_secoonds'
field in struct domain is also modified into a 'time_offset' struct,
containing a 'seconds' field and a boolean 'set' field.

The code in rtc_load() then uses the new value in the save record to
update the value of struct domain's 'time_offset.seconds' unless
'time_offset.set' is true, which will only be the case if the toolstack has
already performed a XEN_DOMCTL_settimeoffset.

[1] There is currently no way for a toolstack to read the value of
'time_offset_seconds' from struct domain. In the past, any hope of
preservation of the value across a guest life-cycle operation was based
on relying on qemu-dm to write a value into xenstore whenever the RTC
was updated, in response to an IOREQ with type IOREQ_TYPE_TIMEOFFSET
being sent by Xen; see:


https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=blob;f=i386-dm/helper2.c#l457

but this behaviour was never forward-ported into upstream QEMU, which
completely ignores that IOREQ type.
In either case, nothing in xl or libxl ever samples the value of
RTC offset from xenstore so any offset adjustment to a non-zero value
performed by the guest (which in the case of Windows is highly likely
as it normally writes RTC in local time, whereas Xen maintains time in
UTC) is completely lost with the de-facto toolstack, and always has
been. Instead, PV drivers are relied upon to paper over this gaping
hole.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Volodymyr Babchuk 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 
---
 xen/arch/arm/platform_hypercall.c  |  2 +-
 xen/arch/arm/time.c|  3 ++-
 xen/arch/arm/vtimer.c  |  4 ++--
 xen/arch/x86/hvm/rtc.c | 12 ++--
 xen/arch/x86/time.c|  3 ++-
 xen/common/time.c  |  6 +++---
 xen/include/public/arch-x86/hvm/save.h |  2 ++
 xen/include/xen/sched.h|  5 -
 8 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/platform_hypercall.c 
b/xen/arch/arm/platform_hypercall.c
index 5aab856ce7..8efac7ee60 100644
--- a/xen/arch/arm/platform_hypercall.c
+++ b/xen/arch/arm/platform_hypercall.c
@@ -53,7 +53,7 @@ long do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) 
u_xenpf_op)
 if ( likely(!op->u.settime64.mbz) )
 do_settime(op->u.settime64.secs,
op->u.settime64.nsecs,
-   op->u.settime64.system_time + 
SECONDS(d->time_offset_seconds));
+   op->u.settime64.system_time + 
SECONDS(d->time_offset.seconds));
 else
 ret = -EINVAL;
 break;
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index 739bcf186c..b0021c2c69 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -353,7 +353,8 @@ void update_vcpu_system_time(struct vcpu *v)
 
 void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
 {
-d->time_offset_seconds = time_offset_seconds;
+d->time_offset.seconds = time_offset_seconds;
+d->time_offset.set = true;
 /* XXX update guest visible wallclock time */
 }
 
diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index e6aebdac9e..240a850b6e 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -64,8 +64,8 @@ int domain_vtimer_init(struct domain *d, struct 
xen_arch_domainconfig *config)
 {
 d->arch.phys_timer_base.offset = NOW();
 d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0);
-d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - 
boot_count);
-do_div(d->time_offset_seconds, 10);
+d->time_offset.seconds = ticks_to_ns(d->arch.virt_timer_base.offset - 
boot_count);
+do_div(d->time_offset.seconds, 10);
 
 config->clock_frequency = timer_dt_clock_frequency;
 
diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 42339682e8..bb41efe84a 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -594,7 +594,7 @@ static void rtc_set_time(RTCState *s)
 
 /* We use the guest's setting of the RTC to define the local-time 
  * offset for 

[Xen-devel] [ovmf test] 144927: all pass - PUSHED

2019-12-18 Thread osstest service owner
flight 144927 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144927/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 01b6090b75922bc72604c334bd3dc331490af3bb
baseline version:
 ovmf c5d6a57da02774019127e5ac271de274aee0d9e2

Last test of basis   144923  2019-12-18 06:39:22 Z0 days
Testing same since   144927  2019-12-18 09:10:04 Z0 days1 attempts


People who touched revisions under test:
  Bob Feng 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   c5d6a57da0..01b6090b75  01b6090b75922bc72604c334bd3dc331490af3bb -> 
xen-tested-master

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock

2019-12-18 Thread SeongJae Park
On Wed, 18 Dec 2019 14:30:44 +0100 "Jürgen Groß"  wrote:

> On 18.12.19 13:42, SeongJae Park wrote:
> > On Wed, 18 Dec 2019 13:27:37 +0100 "Jürgen Groß"  wrote:
> > 
> >> On 18.12.19 11:42, SeongJae Park wrote:
> >>> From: SeongJae Park 
> >>>
> >>> 'reclaim_memory' callback can race with a driver code as this callback
> >>> will be called from any memory pressure detected context.  To deal with
> >>> the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
> >>> 'reclaim_memory' callback is called, the lock of the device which passed
> >>> to the callback as its argument is locked.  Thus, drivers registering
> >>> their 'reclaim_memory' callback should protect the data that might race
> >>> with the callback with the lock by themselves.
> >>
> >> Any reason you don't take the lock around the .probe() and .remove()
> >> calls of the backend (xenbus_dev_probe() and xenbus_dev_remove())? This
> >> would eliminate the need to do that in each backend instead.
> > 
> > First of all, I would like to keep the critical section as small as 
> > possible.
> > With my small test, I could see slightly increasing memory pressure as the
> > critical section becomes wider.  Also, some drivers might share the data 
> > their
> > 'reclaim_memory' callback touches with other functions.  I think only the
> > driver owners can know what data is shared and what is the minimum critical
> > section to protect it.
> 
> But this kind of serialization can still be added on top.

I'm still worrying about the unnecessarily large critical section, but it might
be small enough to be ignored.  If no others have strong objection, I will take
the lock around the '->probe()' and '->remove()'.

> 
> And with the trylock in the reclaim path I believe you can even avoid
> the irq variants of the spinlock. But I might be wrong, so you should
> try that with lockdep enabled. If it is working there is no harm done
> when making the critical section larger, as memory allocations will
> work as before.

Yes, you're right.  I will try test with lockdep.


Thanks,
SeongJae Park

> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 5/7] Add Code Review Guide

2019-12-18 Thread Julien Grall

Hi Lars,

On 12/12/2019 21:14, Lars Kurth wrote:

+### Workflow from an Author's Perspective
+
+When code authors receive feedback on their patches, they typically first try
+to clarify feedback they do not understand. For smaller patches or patch series
+it makes sense to wait until receiving feedback on the entire series before
+sending out a new version addressing the changes. For larger series, it may
+make sense to send out a new revision earlier.
+
+As a reviewer, you need some system that he;ps ensure that you address all


Just a small typo: I think you meant "helps" rather than "he;ps".

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/2] xen/arm: sign extend writes to TimerValue

2019-12-18 Thread Julien Grall

Hi Jeff,

On 11/12/2019 21:13, Jeff Kubascik wrote:

Per the ARMv8 Reference Manual (ARM DDI 0487E.a), section D11.2.4
specifies that the values in the TimerValue view of the timers are
signed in standard two's complement form. When writing to the TimerValue


Do you mean CompareValue register instead of TimerValue register?


register, it should be signed extended as described by the equation

CompareValue = (Counter[63:0] + SignExtend(TimerValue))[63:0]
This explains the signed part, but it does not explain why the 32-bit 
case. So I would mention that TimerValue is a 32-bit signed integer.


Maybe saying "are 32-bit signed in standard ..."



Signed-off-by: Jeff Kubascik 
---
  xen/arch/arm/vtimer.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index 21b98ec20a..872181d9b6 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -211,7 +211,7 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, 
uint32_t *r,
  }
  else
  {
-v->arch.phys_timer.cval = cntpct + *r;
+v->arch.phys_timer.cval = cntpct + (uint64_t)(int32_t)*r;
  if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )
  {
  v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING;



Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 1/2] xen/arm: remove physical timer offset

2019-12-18 Thread Julien Grall

Hi Jeff,

On 11/12/2019 21:13, Jeff Kubascik wrote:

The physical timer traps apply an offset so that time starts at 0 for
the guest. However, this offset is not currently applied to the physical
counter. Per the ARMv8 Reference Manual (ARM DDI 0487E.a), section
D11.2.4 Timers, the "Offset" between the counter and timer should be
zero for a physical timer. This removes the offset to make the timer and
counter consistent.

This also cleans up the physical timer implementation to better match
the virtual timer - both cval's now hold the hardware value.

Signed-off-by: Jeff Kubascik 
---
  xen/arch/arm/vtimer.c| 34 ++
  xen/include/asm-arm/domain.h |  3 ---
  2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index e6aebdac9e..21b98ec20a 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -62,7 +62,6 @@ static void virt_timer_expired(void *data)
  
  int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)

  {
-d->arch.phys_timer_base.offset = NOW();
  d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0);
  d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - 
boot_count);
  do_div(d->time_offset_seconds, 10);
@@ -108,7 +107,6 @@ int vcpu_vtimer_init(struct vcpu *v)
  
  init_timer(>timer, phys_timer_expired, t, v->processor);

  t->ctl = 0;
-t->cval = NOW();
  t->irq = d0
  ? timer_get_irq(TIMER_PHYS_NONSECURE_PPI)
  : GUEST_TIMER_PHYS_NS_PPI;
@@ -167,6 +165,7 @@ void virt_timer_restore(struct vcpu *v)
  static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, uint32_t *r, bool 
read)
  {
  struct vcpu *v = current;
+s_time_t expires;
  
  if ( !ACCESS_ALLOWED(regs, EL0PTEN) )

  return false;
@@ -184,8 +183,9 @@ static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, 
uint32_t *r, bool read)
  
  if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )

  {
-set_timer(>arch.phys_timer.timer,
-  v->arch.phys_timer.cval + 
v->domain->arch.phys_timer_base.offset);
+expires = v->arch.phys_timer.cval > boot_count
+  ? ticks_to_ns(v->arch.phys_timer.cval - boot_count) : 0;
+set_timer(>arch.phys_timer.timer, expires);
  }
  else
  stop_timer(>arch.phys_timer.timer);
@@ -197,26 +197,27 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, 
uint32_t *r,
   bool read)
  {
  struct vcpu *v = current;
-s_time_t now;
+uint64_t cntpct;
+s_time_t expires;
  
  if ( !ACCESS_ALLOWED(regs, EL0PTEN) )

  return false;
  
-now = NOW() - v->domain->arch.phys_timer_base.offset;

+cntpct = get_cycles();
  
  if ( read )

  {
-*r = (uint32_t)(ns_to_ticks(v->arch.phys_timer.cval - now) & 
0xull);
+*r = (uint32_t)((v->arch.phys_timer.cval - cntpct) & 0xull);
  }
  else
  {
-v->arch.phys_timer.cval = now + ticks_to_ns(*r);
+v->arch.phys_timer.cval = cntpct + *r;
  if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )
  {
  v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING;
-set_timer(>arch.phys_timer.timer,
-  v->arch.phys_timer.cval +
-  v->domain->arch.phys_timer_base.offset);
+expires = v->arch.phys_timer.cval > boot_count
+  ? ticks_to_ns(v->arch.phys_timer.cval - boot_count) : 0;


You probably want a comment to explain why you set to 0 here.


+set_timer(>arch.phys_timer.timer, expires);
  }
  }
  return true;
@@ -226,23 +227,24 @@ static bool vtimer_cntp_cval(struct cpu_user_regs *regs, 
uint64_t *r,
   bool read)
  {
  struct vcpu *v = current;
+s_time_t expires;
  
  if ( !ACCESS_ALLOWED(regs, EL0PTEN) )

  return false;
  
  if ( read )

  {
-*r = ns_to_ticks(v->arch.phys_timer.cval);
+*r = v->arch.phys_timer.cval;
  }
  else
  {
-v->arch.phys_timer.cval = ticks_to_ns(*r);
+v->arch.phys_timer.cval = *r;
  if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )
  {
  v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING;
-set_timer(>arch.phys_timer.timer,
-  v->arch.phys_timer.cval +
-  v->domain->arch.phys_timer_base.offset);
+expires = v->arch.phys_timer.cval > boot_count
+  ? ticks_to_ns(v->arch.phys_timer.cval - boot_count) : 0;


Same here. But I am wondering whether we could factor this code in a 
function. This would avoid code duplication and make the code simpler.


This can be done as a follow-up as we may want to backport the fix.


+set_timer(>arch.phys_timer.timer, expires);
  }
  }
  return 

[Xen-devel] [xen-unstable-smoke test] 144931: tolerable all pass - PUSHED

2019-12-18 Thread osstest service owner
flight 144931 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144931/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  0e7c69bd3c0b35a677d73843b39522787ccf5a3f
baseline version:
 xen  704fa1532801bc02c4500462f0b913b3c137db4d

Last test of basis   144912  2019-12-17 22:02:21 Z0 days
Testing same since   144931  2019-12-18 12:00:25 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Steven Haigh 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   704fa15328..0e7c69bd3c  0e7c69bd3c0b35a677d73843b39522787ccf5a3f -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH v3] x86/vm_event: add short-circuit for breakpoints (aka, , "fast single step")

2019-12-18 Thread Tamas K Lengyel
> diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
> index aa54c86325..cb577a7ba9 100644
> --- a/xen/include/public/vm_event.h
> +++ b/xen/include/public/vm_event.h
> @@ -110,6 +110,11 @@
>   * interrupt pending after resuming the VCPU.
>   */
>  #define VM_EVENT_FLAG_GET_NEXT_INTERRUPT (1 << 10)
> +/*
> + * Execute fast singlestepping on vm_event response.
> + * Requires the vCPU to be paused already (synchronous events only).
> + */
> +#define VM_EVENT_FLAG_FAST_SINGLESTEP  (1 << 11)

Just another minor style nitpick: alignment of (1 << 11) is off
compared to all of the previous declaration above.

Tamas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 9/9] xen/sched: add const qualifier where appropriate

2019-12-18 Thread Dario Faggioli
On Wed, 2019-12-18 at 08:48 +0100, Juergen Gross wrote:
> Make use of the const qualifier more often in scheduling code.
> 
> Signed-off-by: Juergen Gross 
>
Cool!

Reviewed-by: Dario Faggioli 

Another thing that it may be worth checking is whether all the places
where 'int' is used for CPUs and vCPUs IDs (or alike) really need to be
integer, or could be turned into unsigned.

Of course, I'm not suggesting/asking to you to do that as well, I'm
just mentioning in case anyone is interested/has time, or even just for
the records.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [RFC] Integrate CoC, Governance, Security Policy and other key documents into sphinx docs

2019-12-18 Thread Lars Kurth
Hi all,

now that 4.13 is out of the way I wanted to get the CoC discussion closed - see 
https://lists.xenproject.org/archives/html/xen-devel/2019-12/threads.html#00926,
 which means I need ACKs or final suggestions. The next step would be to 
publish it on the website.

However, I have also been thinking about keeping some documents in multiple 
places and defining a *master* copy somewhere in a tree. Right now, these are a 
few personal repos that I own, which seems unnecessary, given that we have the 
sphinx docs. In the interest of improving the docs, we also need more useful 
content in the docs to guide people to them.

My proposal would be to move the master sources for a number of key process 
docs to xen.git:/docs maybe under a "Working with the Xen Project community" in 
a process-guide directory. 
This would then include content from
• http://xenbits.xen.org/gitweb/?p=people/larsk/governance.git;a=summary
• http://xenbits.xen.org/gitweb/?p=people/larsk/security-process.git;a=summary
• http://xenbits.xen.org/gitweb/?p=people/larsk/code-of-conduct.git;a=summary

and we could also consider including some of the wiki pages related to 
contribution workflow and re-direct the pages.

We would need to answer some questions, such as
a) Are we OK with these staying in markdown - I don’t mind converting
b) Are we OK with some of the documents needing project wide agreement before 
they can be changed, specifically this would cover
- governance.git
- code-of-conduct.git:code-of-conduct.md
- code-of-conduct.git:communication-guide.md

Best Regards
Lars





 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs

2019-12-18 Thread Julien Grall

Hi Varad,

Please send new version of a patch in a new thread rather than in-reply 
to the first version.


On 18/12/2019 10:53, Varad Gautam wrote:

XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
In that scenario, it is possible to receive multiple _pirq_guest_unbind
calls for the same pirq from domain_kill, if the pirq has not yet been
removed from the domain's pirq_tree, as:
   domain_kill()
 -> domain_relinquish_resources()
   -> pci_release_devices()
 -> pci_clean_dpci_irq()
   -> pirq_guest_unbind()
 -> __pirq_guest_unbind()

For a shared pirq (nr_guests > 1), the first call would zap the current
domain from the pirq's guests[] list, but the action handler is never freed
as there are other guests using this pirq. As a result, on the second call,
__pirq_guest_unbind searches for the current domain which has been removed
from the guests[] list, and hits a BUG_ON.

Make __pirq_guest_unbind safe to be called multiple times by letting xen
continue if a shared pirq has already been unbound from this guest. The
PIRQ will be cleaned up from the domain's pirq_tree during the destruction
in complete_domain_destroy anyways.

Signed-off-by: Varad Gautam 
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Andrew Cooper 

v2: Split the check on action->nr_guests > 0 and make it an ASSERT, reword.
---
  xen/arch/x86/irq.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 5d0d94c..3eb7b22 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind(
  
  for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )

  continue;
-BUG_ON(i == action->nr_guests);
+if ( i == action->nr_guests ) {


The { should be a new line.


+ASSERT(action->nr_guests > 0) ;


The space before ; is not necessary.


+/* In case the pirq was shared, unbound for this domain in an earlier 
call, but still
+ * existed on the domain's pirq_tree, we still reach here if there are 
any later
+ * unbind calls on the same pirq. Return if such an unbind happens. */


The coding style for comment is:

/*
 * Foo
 * Bar
 */


+if ( action->shareable )
+return NULL;
+BUG();


Given that the previous BUG_ON() was hit, would it make sense to try to 
avoid a new BUG().


So why not just returning NULL as you do for action->shareable?


+}
+
  memmove(>guest[i], >guest[i+1],
  (action->nr_guests-i-1) * sizeof(action->guest[0]));
  action->nr_guests--;



Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] tools/python: Drop test.py

2019-12-18 Thread Andrew Cooper
This file hasn't been touched since it was introduced in 2005 (c/s 0c6f36628)
and has a wildly obsolete shebang for Python 2.3.  Most importantly for us is
that it isn't Python 3 compatible.

Drop the file entirely.  Since the 2.3 days, automatic discovery of tests has
been included in standard functionality.  Rewrite the test rule to use
"$(PYTHON) -m unittest discover" which is equivelent.

Dropping test.py drops the only piece of ZPL-2.0 code in the tree.  Drop the
ancillary files, and adjust COPYING to match.

Signed-off-by: Andrew Cooper 
---
CC: Ian Jackson 
CC: Wei Liu 
CC: Lars Kurth 

This wants backporting to 4.13 as soon as practical.
---
 COPYING   |1 -
 tools/python/Makefile |2 +-
 tools/python/README   |3 -
 tools/python/ZPL-2.0  |   59 ---
 tools/python/test.py  | 1094 -
 5 files changed, 1 insertion(+), 1158 deletions(-)
 delete mode 100644 tools/python/README
 delete mode 100644 tools/python/ZPL-2.0
 delete mode 100644 tools/python/test.py

diff --git a/COPYING b/COPYING
index 80fac091d3..a4bc2b2dd4 100644
--- a/COPYING
+++ b/COPYING
@@ -57,7 +57,6 @@ Xen tree, retaining the original license, such as
   - Laurikari License
   - Public Domain
   - ZLIB License
-  - ZPL 2.0
 
 Significant code imports are highlighted in a README.source file
 in the directory into which the file or code snippet was imported.
diff --git a/tools/python/Makefile b/tools/python/Makefile
index 541858e2f8..e99f78a537 100644
--- a/tools/python/Makefile
+++ b/tools/python/Makefile
@@ -33,7 +33,7 @@ uninstall:
 
 .PHONY: test
 test:
-   export LD_LIBRARY_PATH=$$(readlink -f ../libxc):$$(readlink -f 
../xenstore); $(PYTHON) test.py -b -u
+   LD_LIBRARY_PATH=$$(readlink -f ../libxc):$$(readlink -f ../xenstore) 
$(PYTHON) -m unittest discover
 
 .PHONY: clean
 clean:
diff --git a/tools/python/README b/tools/python/README
deleted file mode 100644
index 8fffef3a00..00
--- a/tools/python/README
+++ /dev/null
@@ -1,3 +0,0 @@
-The file test.py here is from the Zope project, and is Copyright (c) 2001,
-2002 Zope Corporation and Contributors.  This file is released under the Zope
-Public License, version 2.0, a copy of which is in the file ZPL-2.0.
diff --git a/tools/python/ZPL-2.0 b/tools/python/ZPL-2.0
deleted file mode 100644
index 5582f08b89..00
--- a/tools/python/ZPL-2.0
+++ /dev/null
@@ -1,59 +0,0 @@
-Zope Public License (ZPL) Version 2.0

-
-This software is Copyright (c) Zope Corporation (tm) and
-Contributors. All rights reserved.
-
-This license has been certified as open source. It has also
-been designated as GPL compatible by the Free Software
-Foundation (FSF).
-
-Redistribution and use in source and binary forms, with or
-without modification, are permitted provided that the
-following conditions are met:
-
-1. Redistributions in source code must retain the above
-   copyright notice, this list of conditions, and the following
-   disclaimer.
-
-2. Redistributions in binary form must reproduce the above
-   copyright notice, this list of conditions, and the following
-   disclaimer in the documentation and/or other materials
-   provided with the distribution.
-
-3. The name Zope Corporation (tm) must not be used to
-   endorse or promote products derived from this software
-   without prior written permission from Zope Corporation.
-
-4. The right to distribute this software or to use it for
-   any purpose does not give you the right to use Servicemarks
-   (sm) or Trademarks (tm) of Zope Corporation. Use of them is
-   covered in a separate agreement (see
-   http://www.zope.com/Marks).
-
-5. If any files are modified, you must cause the modified
-   files to carry prominent notices stating that you changed
-   the files and the date of any change.
-
-Disclaimer
-
-  THIS SOFTWARE IS PROVIDED BY ZOPE CORPORATION ``AS IS''
-  AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT
-  NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
-  AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN
-  NO EVENT SHALL ZOPE CORPORATION OR ITS CONTRIBUTORS BE
-  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-  EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
-  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
-  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
-  OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
-  DAMAGE.
-
-
-This software consists of contributions made by Zope
-Corporation and many individuals on behalf of Zope
-Corporation.  Specific attributions are listed in the
-accompanying credits file.
\ No newline at end of file
diff --git a/tools/python/test.py b/tools/python/test.py
deleted file mode 100644
index 13912f61a6..00
--- 

Re: [Xen-devel] [PATCH for-next 7/7] x86: implement Hyper-V clock source

2019-12-18 Thread Wei Liu
On Wed, Dec 18, 2019 at 02:24:33PM +0100, Jan Beulich wrote:
> On 18.12.2019 14:18, Wei Liu wrote:
> > On Wed, Dec 18, 2019 at 01:51:54PM +0100, Jan Beulich wrote:
> >> On 18.12.2019 13:38, Wei Liu wrote:
> >>> On Tue, Dec 10, 2019 at 05:59:04PM +0100, Jan Beulich wrote:
>  On 25.10.2019 11:16, Wei Liu wrote:
> > +static inline uint64_t read_hyperv_timer(void)
> > +{
> > +uint64_t scale, offset, ret, tsc;
> > +uint32_t seq;
> > +struct ms_hyperv_tsc_page *tsc_page = _tsc_page;
> > +
> > +do {
> > +seq = tsc_page->tsc_sequence;
> > +
> > +/* Seq 0 is special. It means the TSC enlightenment is not
> > + * available at the moment. The reference time can only be
> > + * obtained from the Reference Counter MSR.
> > + */
> > +if ( seq == 0 )
> > +{
> > +rdmsrl(HV_X64_MSR_TIME_REF_COUNT, ret);
> > +return ret;
> > +}
> > +
> > +smp_rmb();
> > +
> > +tsc = rdtsc_ordered();
> 
>  This already includes at least a read fence.
> >>>
> >>> OK. rdtsc() should be enough here.
> >>
> >> Are you sure? My comment was rather towards the dropping of smp_rmb()
> >> (maybe replacing by a comment).
> > 
> > I do mean to keep smp_rmb() before it. Is that not enough?
> 
> With
> 
> #define smp_rmb()   barrier()
> 
> it isn't - it's merely a compiler barrier, but for the ordering
> you want you need a fence.

Ah, I see. Thank you.

Wei.

> 
> Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >