Re: BUG: credit=sched2 machine hang when using DRAKVUF

2020-10-27 Thread Michał Leszczyński
- 23 paź, 2020 o 6:47, Jürgen Groß jgr...@suse.com napisał(a):

> On 23.10.20 00:59, Michał Leszczyński wrote:
>> Hello,
>> 
>> when using DRAKVUF against a Windows 7 x64 DomU, the whole machine hangs 
>> after a
>> few minutes.
>> 
>> The chance for a hang seems to be correlated with number of PCPUs, in this 
>> case
>> we have 14 PCPUs and hang is very easily reproducible, while on other 
>> machines
>> with 2-4 PCPUs it's very rare (but still occurring sometimes). The issue is
>> observed with the default sched=credit2 and is no longer reproducible once
>> sched=credit is set.
> 
> Interesting. Can you please share some more information?
> 
> Which Xen version are you using?
> 
> Is there any additional information in the dom0 log which could be
> related to the hang (earlier WARN() splats, Oopses, Xen related
> messages, hardware failure messages, ...?
> 
> Can you please try to get backtraces of all cpus at the time of the
> hang?
> 
> It would help to know which cpu was the target of the call of
> smp_call_function_single(), so a disassembly of that function would
> be needed to find that information from the dumped registers.
> 
> I'm asking because I've seen a similar problem recently and I was
> rather suspecting a fifo event channel issue than the Xen scheduler,
> but your data suggests it could be the scheduler after all (if it is
> the same issue, of course).
> 
> 
> Juergen


As I've said before, I'm using RELEASE-4.14.0, this is DELL PowerEdge R640 with 
14 PCPUs.

I have the following additional pieces of log (enclosed below). As you could 
see, the issue is about particular vCPUs of Dom0 not being scheduled for a long 
time, which really decreases stability of the host system.

Hope this helps somehow.



Best regards,
Michał Leszczyński
CERT Polska

---

[  313.730969] rcu: INFO: rcu_sched self-detected stall on CPU
[  313.731154] rcu: 5-: (5249 ticks this GP) 
idle=c6e/1/0x4002 softirq=4625/4625 fqs=2624
[  313.731474] rcu:  (t=5250 jiffies g=10309 q=220)
[  338.968676] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [sshd:5991]
[  346.963959] watchdog: BUG: soft lockup - CPU#2 stuck for 23s! 
[xenconsoled:2747]
(XEN) *** Serial input to Xen (type 'CTRL-a' three times to switch input)
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=384307105230
(XEN) Online Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26
(XEN) Cpupool 0:
(XEN) Cpus: 0,2,4,6,8,10,12,14,16,18,20,22,24,26
(XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
(XEN) Scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Active queues: 2
(XEN)   default-weight = 256
(XEN) Runqueue 0:
(XEN)   ncpus  = 7
(XEN)   cpus   = 0,2,4,6,8,10,12
(XEN)   max_weight = 256
(XEN)   pick_bias  = 10
(XEN)   instload   = 3
(XEN)   aveload= 805194 (~307%)
(XEN)   idlers: ,,,,,,1145
(XEN)   tickled: ,,,,,,
(XEN)   fully idle cores: 
,,,,,,1145
(XEN) Runqueue 1:
(XEN)   ncpus  = 7
(XEN)   cpus   = 14,16,18,20,22,24,26
(XEN)   max_weight = 256
(XEN)   pick_bias  = 22
(XEN)   instload   = 0
(XEN)   aveload= 51211 (~19%)
(XEN)   idlers: ,,,,,,05454000
(XEN)   tickled: ,,,,,,
(XEN)   fully idle cores: 
,,,,,,05454000
(XEN) Domain info:
(XEN)   Domain: 0 w 256 c 0 v 14
(XEN) 1: [0.0] flags=20 cpu=0 credit=-1000 [w=256] load=4594 (~1%)
(XEN) 2: [0.1] flags=20 cpu=2 credit=9134904 [w=256] load=262144 (~100%)
(XEN) 3: [0.2] flags=22 cpu=4 credit=-1000 [w=256] load=262144 (~100%)
(XEN) 4: [0.3] flags=20 cpu=6 credit=-1000 [w=256] load=4299 (~1%)
(XEN) 5: [0.4] flags=20 cpu=8 credit=-1000 [w=256] load=4537 (~1%)
(XEN) 6: [0.5] flags=22 cpu=10 credit=-1000 [w=256] load=262144 (~100%)
(XEN) 7: [0.6] flags=20 cpu=12 credit=-1000 [w=256] load=5158 (~1%)
(XEN) 8: [0.7] flags=20 cpu=14 credit=10053352 [w=256] load=5150 (~1%)
(XEN) 9: [0.8] flags=20 cpu=16 credit=10200526 [w=256] load=5155 (~1%)
(XEN)10: [0.9] flags=20 cpu=18 credit=10207025 [w=256] load=4939 (~1%)
(XEN)11: [0.10] flags=20 cpu=20 credit=10131199 [w=256] load=5753 (~2%)
(XEN)12: [0.11] flags=20 cpu=22 credit=8103663 [w=256] load=22544 (~8%)
(XEN)13: [0.12] flags=20 cpu=24 credit=10213151 [w=256] load=4905 (~1%)
(XEN)14: [0.13] flags=20 cpu=26 credit=10235821 [w=256] load=4858 (~1%)
(XEN)   Domain: 29 w 256 c 0 v 4
(XEN)15: [29.0] flags=0 cpu=16 credit=1050 [w=256] load=0 (~0%)
(XEN)16: [29.1] flags=0

Re: BUG: credit=sched2 machine hang when using DRAKVUF

2020-10-23 Thread Michał Leszczyński
- 23 paź, 2020 o 6:47, Jürgen Groß jgr...@suse.com napisał(a):

> On 23.10.20 00:59, Michał Leszczyński wrote:
>> Hello,
>> 
>> when using DRAKVUF against a Windows 7 x64 DomU, the whole machine hangs 
>> after a
>> few minutes.
>> 
>> The chance for a hang seems to be correlated with number of PCPUs, in this 
>> case
>> we have 14 PCPUs and hang is very easily reproducible, while on other 
>> machines
>> with 2-4 PCPUs it's very rare (but still occurring sometimes). The issue is
>> observed with the default sched=credit2 and is no longer reproducible once
>> sched=credit is set.
> 
> Interesting. Can you please share some more information?
> 
> Which Xen version are you using?

RELEASE-4.14

> 
> Is there any additional information in the dom0 log which could be
> related to the hang (earlier WARN() splats, Oopses, Xen related
> messages, hardware failure messages, ...?

I will try to find something out next week and will come back to you.

> 
> Can you please try to get backtraces of all cpus at the time of the
> hang?
> 
> It would help to know which cpu was the target of the call of
> smp_call_function_single(), so a disassembly of that function would
> be needed to find that information from the dumped registers.
> 
> I'm asking because I've seen a similar problem recently and I was
> rather suspecting a fifo event channel issue than the Xen scheduler,
> but your data suggests it could be the scheduler after all (if it is
> the same issue, of course).
> 
> 
> Juergen



BUG: credit=sched2 machine hang when using DRAKVUF

2020-10-22 Thread Michał Leszczyński
Hello,

when using DRAKVUF against a Windows 7 x64 DomU, the whole machine hangs after 
a few minutes.

The chance for a hang seems to be correlated with number of PCPUs, in this case 
we have 14 PCPUs and hang is very easily reproducible, while on other machines 
with 2-4 PCPUs it's very rare (but still occurring sometimes). The issue is 
observed with the default sched=credit2 and is no longer reproducible once 
sched=credit is set.


Enclosed: panic log from my Dom0.

Best regards,
Michał Leszczyński
CERT Polska


paź 22 12:20:50 hostname kernel: rcu: INFO: rcu_sched self-detected stall on CPU
paź 22 12:20:50 hostname kernel: rcu: 3-: (21002 ticks this GP) 
idle=7e2/1/0x4002 softirq=61729/61729 fqs=10490
paź 22 12:20:50 hostname kernel: rcu:  (t=21003 jiffies g=36437 q=9406)
paź 22 12:20:50 hostname kernel: NMI backtrace for cpu 3
paź 22 12:20:50 hostname kernel: CPU: 3 PID: 4153 Comm: drakvuf Tainted: P  
 OEL4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
paź 22 12:20:50 hostname kernel: Hardware name: Dell Inc. PowerEdge 
R640/08HT8T, BIOS 2.1.8 04/30/2019
paź 22 12:20:50 hostname kernel: Call Trace:
paź 22 12:20:50 hostname kernel:  
paź 22 12:20:50 hostname kernel:  dump_stack+0x5c/0x80
paź 22 12:20:50 hostname kernel:  nmi_cpu_backtrace.cold.4+0x13/0x50
paź 22 12:20:50 hostname kernel:  ? lapic_can_unplug_cpu.cold.29+0x3b/0x3b
paź 22 12:20:50 hostname kernel:  nmi_trigger_cpumask_backtrace+0xf9/0xfb
paź 22 12:20:50 hostname kernel:  rcu_dump_cpu_stacks+0x9b/0xcb
paź 22 12:20:50 hostname kernel:  rcu_check_callbacks.cold.81+0x1db/0x335
paź 22 12:20:50 hostname kernel:  ? tick_sched_do_timer+0x60/0x60
paź 22 12:20:50 hostname kernel:  update_process_times+0x28/0x60
paź 22 12:20:50 hostname kernel:  tick_sched_handle+0x22/0x60
paź 22 12:20:50 hostname kernel:  tick_sched_timer+0x37/0x70
paź 22 12:20:50 hostname kernel:  __hrtimer_run_queues+0x100/0x280
paź 22 12:20:50 hostname kernel:  hrtimer_interrupt+0x100/0x220
paź 22 12:20:50 hostname kernel:  xen_timer_interrupt+0x1e/0x30
paź 22 12:20:50 hostname kernel:  __handle_irq_event_percpu+0x46/0x190
paź 22 12:20:50 hostname kernel:  handle_irq_event_percpu+0x30/0x80
paź 22 12:20:50 hostname kernel:  handle_percpu_irq+0x40/0x60
paź 22 12:20:50 hostname kernel:  generic_handle_irq+0x27/0x30
paź 22 12:20:50 hostname kernel:  __evtchn_fifo_handle_events+0x17d/0x190
paź 22 12:20:50 hostname kernel:  __xen_evtchn_do_upcall+0x42/0x80
paź 22 12:20:50 hostname kernel:  xen_evtchn_do_upcall+0x27/0x40
paź 22 12:20:50 hostname kernel:  xen_do_hypervisor_callback+0x29/0x40
paź 22 12:20:50 hostname kernel:  
paź 22 12:20:50 hostname kernel: RIP: e030:smp_call_function_single+0xce/0xf0
paź 22 12:20:50 hostname kernel: Code: 8b 4c 24 38 65 48 33 0c 25 28 00 00 00 
75 34 c9 c3 48 89 d1 48 89 f2 48 89 e6 e8 6d fe ff ff 8b 54 24 18 83 e2 01 74 
0b f3 90 <8b> 54 24 18 8
3 e2 01 75 f5 eb ca 8b 05 b9 99 4d 01 85 c0 75 88 0f
paź 22 12:20:50 hostname kernel: RSP: e02b:c9004713bd00 EFLAGS: 0202
paź 22 12:20:50 hostname kernel: RAX:  RBX: 888b0b6eea40 
RCX: 0200
paź 22 12:20:50 hostname kernel: RDX: 0001 RSI: 8212e4a0 
RDI: 81c2dec0
paź 22 12:20:50 hostname kernel: RBP: c9004713bd50 R08:  
R09: 888c54052480
paź 22 12:20:50 hostname kernel: R10: 888c540524a8 R11:  
R12: c9004713bd60
paź 22 12:20:50 hostname kernel: R13: 8000 R14: 8000 
R15: 888b0b6eeab0
paź 22 12:20:50 hostname kernel:  ? xen_pgd_alloc+0x110/0x110
paź 22 12:20:50 hostname kernel:  xen_exit_mmap+0xaa/0x100
paź 22 12:20:50 hostname kernel:  exit_mmap+0x64/0x180
paź 22 12:20:50 hostname kernel:  ? __raw_spin_unlock+0x5/0x10
paź 22 12:20:50 hostname kernel:  ? __handle_mm_fault+0x1090/0x1270
paź 22 12:20:50 hostname kernel:  ? _raw_spin_unlock_irqrestore+0x14/0x20
paź 22 12:20:50 hostname kernel:  ? exit_robust_list+0x5b/0x130
paź 22 12:20:50 hostname kernel:  mmput+0x54/0x130
paź 22 12:20:50 hostname kernel:  do_exit+0x290/0xb90
paź 22 12:20:50 hostname kernel:  ? handle_mm_fault+0xd6/0x200
paź 22 12:20:50 hostname kernel:  do_group_exit+0x3a/0xa0
paź 22 12:20:50 hostname kernel:  __x64_sys_exit_group+0x14/0x20
paź 22 12:20:50 hostname kernel:  do_syscall_64+0x53/0x110
paź 22 12:20:50 hostname kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
paź 22 12:20:50 hostname kernel: RIP: 0033:0x7f98d23ec9d6
paź 22 12:20:50 hostname kernel: Code: Bad RIP value.
paź 22 12:20:50 hostname kernel: RSP: 002b:7ffc4a0327f8 EFLAGS: 0246 
ORIG_RAX: 00e7
paź 22 12:20:50 hostname kernel: RAX: ffda RBX: 7f98d24dd760 
RCX: 7f98d23ec9d6
paź 22 12:20:50 hostname kernel: RDX:  RSI: 003c 
RDI: 
paź 22 12:20:50 hostname kernel: RBP:  R08: 00e7 
R09: ff60
paź 22 12:20:50 hostname kernel: R10:  R11: 024

Re: [PATCH v6 00/11] Implement support for external IPT monitoring

2020-07-14 Thread Michał Leszczyński
- 7 lip 2020 o 21:39, Michał Leszczyński michal.leszczyn...@cert.pl 
napisał(a):

> Intel Processor Trace is an architectural extension available in modern Intel
> family CPUs. It allows recording the detailed trace of activity while the
> processor executes the code. One might use the recorded trace to reconstruct
> the code flow. It means, to find out the executed code paths, determine
> branches taken, and so forth.
> 
> The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures
> Software Developer's Manual Volume 3C: System Programming Guide, Part 3,
> Chapter 36: "Intel Processor Trace."
> 
> This patch series implements an interface that Dom0 could use in order to
> enable IPT for particular vCPUs in DomU, allowing for external monitoring. 
> Such
> a feature has numerous applications like malware monitoring, fuzzing, or
> performance testing.
> 
> Also thanks to Tamas K Lengyel for a few preliminary hints before
> first version of this patch was submitted to xen-devel.
> 
> Changed since v1:
>  * MSR_RTIT_CTL is managed using MSR load lists
>  * other PT-related MSRs are modified only when vCPU goes out of context
>  * trace buffer is now acquired as a resource
>  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
>must be specified in the moment of domain creation
>  * trace buffers are allocated on domain creation, destructed on
>domain destruction
>  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
>these calls don't manage buffer memory anymore
>  * lifted 32 MFN/GFN array limit when acquiring resources
>  * minor code style changes according to review
> 
> Changed since v2:
>  * trace buffer is now allocated on domain creation (in v2 it was
>allocated when hvm param was set)
>  * restored 32-item limit in mfn/gfn arrays in acquire_resource
>and instead implemented hypercall continuations
>  * code changes according to Jan's and Roger's review
> 
> Changed since v3:
>  * vmtrace HVMOPs are not implemented as DOMCTLs
>  * patches splitted up according to Andrew's comments
>  * code changes according to v3 review on the mailing list
> 
> Changed since v4:
>  * rebased to commit be63d9d4
>  * fixed dependencies between patches
>(earlier patches don't reference further patches)
>  * introduced preemption check in acquire_resource
>  * moved buffer allocation to common code
>  * splitted some patches according to code review
>  * minor fixes according to code review
> 
> Changed since v5:
>  * trace buffer size is now dynamically determined by the proctrace
>tool
>  * trace buffer size variable is uniformly defined as uint32_t
>processor_trace_buf_kb in hypervisor, toolstack and ABI
>  * buffer pages are not freed explicitly but reference count is
>now used instead
>  * minor fixes according to code review
> 
> This patch series is available on GitHub:
> https://github.com/icedevml/xen/tree/ipt-patch-v6
> 
> 
> Michal Leszczynski (11):
>  memory: batch processing in acquire_resource()
>  x86/vmx: add Intel PT MSR definitions
>  x86/vmx: add IPT cpu feature
>  common: add vmtrace_pt_size domain parameter
>  tools/libxl: add vmtrace_pt_size parameter
>  x86/hvm: processor trace interface in HVM
>  x86/vmx: implement IPT in VMX
>  x86/mm: add vmtrace_buf resource type
>  x86/domctl: add XEN_DOMCTL_vmtrace_op
>  tools/libxc: add xc_vmtrace_* functions
>  tools/proctrace: add proctrace tool
> 
> docs/man/xl.cfg.5.pod.in|  13 ++
> tools/golang/xenlight/helpers.gen.go|   2 +
> tools/golang/xenlight/types.gen.go  |   1 +
> tools/libxc/Makefile|   1 +
> tools/libxc/include/xenctrl.h   |  40 +
> tools/libxc/xc_vmtrace.c|  87 ++
> tools/libxl/libxl.h |   8 +
> tools/libxl/libxl_create.c  |   1 +
> tools/libxl/libxl_types.idl |   4 +
> tools/proctrace/Makefile|  45 +
> tools/proctrace/proctrace.c | 179 
> tools/xl/xl_parse.c |  22 +++
> xen/arch/x86/domain.c   |  27 +++
> xen/arch/x86/domctl.c   |  50 ++
> xen/arch/x86/hvm/vmx/vmcs.c |  15 +-
> xen/arch/x86/hvm/vmx/vmx.c  | 110 
> xen/common/domain.c |  46 +
> xen/common/memory.c |  80 -
> xen/include/asm-x86/cpufeature.h|   1 +
> xen/include/asm-x86/hvm/hvm.h   |  20 +++
> xen/include/asm-x86/hvm/vmx/vmcs.h  |   4 +
> xen/include/

[PATCH v6 11/11] tools/proctrace: add proctrace tool

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Add an demonstration tool that uses xc_vmtrace_* calls in order
to manage external IPT monitoring for DomU.

Signed-off-by: Michal Leszczynski 
---
 tools/proctrace/Makefile|  45 +
 tools/proctrace/proctrace.c | 179 
 2 files changed, 224 insertions(+)
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

diff --git a/tools/proctrace/Makefile b/tools/proctrace/Makefile
new file mode 100644
index 00..9c135229b9
--- /dev/null
+++ b/tools/proctrace/Makefile
@@ -0,0 +1,45 @@
+# Copyright (C) CERT Polska - NASK PIB
+# Author: Michał Leszczyński 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; under version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+XEN_ROOT=$(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+CFLAGS  += -Werror
+CFLAGS  += $(CFLAGS_libxenevtchn)
+CFLAGS  += $(CFLAGS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenevtchn)
+LDLIBS  += $(LDLIBS_libxenforeignmemory)
+
+.PHONY: all
+all: build
+
+.PHONY: build
+build: proctrace
+
+.PHONY: install
+install: build
+   $(INSTALL_DIR) $(DESTDIR)$(sbindir)
+   $(INSTALL_PROG) proctrace $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: uninstall
+uninstall:
+   rm -f $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: clean
+clean:
+   $(RM) -f proctrace $(DEPS_RM)
+
+.PHONY: distclean
+distclean: clean
+
+-include $(DEPS_INCLUDE)
diff --git a/tools/proctrace/proctrace.c b/tools/proctrace/proctrace.c
new file mode 100644
index 00..3c1ee8
--- /dev/null
+++ b/tools/proctrace/proctrace.c
@@ -0,0 +1,179 @@
+/**
+ * tools/proctrace.c
+ *
+ * Demonstrative tool for collecting Intel Processor Trace data from Xen.
+ *  Could be used to externally monitor a given vCPU in given DomU.
+ *
+ * Copyright (C) 2020 by CERT Polska - NASK PIB
+ *
+ * Authors: Michał Leszczyński, michal.leszczyn...@cert.pl
+ * Date:June, 2020
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; under version 2 of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+volatile int interrupted = 0;
+volatile int domain_down = 0;
+
+void term_handler(int signum) {
+interrupted = 1;
+}
+
+int main(int argc, char* argv[]) {
+xc_interface *xc;
+uint32_t domid;
+uint32_t vcpu_id;
+uint64_t size;
+
+int rc = -1;
+uint8_t *buf = NULL;
+uint64_t last_offset = 0;
+
+xenforeignmemory_handle *fmem;
+xenforeignmemory_resource_handle *fres;
+
+if (signal(SIGINT, term_handler) == SIG_ERR)
+{
+fprintf(stderr, "Failed to register signal handler\n");
+return 1;
+}
+
+if (argc != 3) {
+fprintf(stderr, "Usage: %s  \n", argv[0]);
+fprintf(stderr, "It's recommended to redirect this"
+"program's output to file\n");
+fprintf(stderr, "or to pipe it's output to xxd or other program.\n");
+return 1;
+}
+
+domid = atoi(argv[1]);
+vcpu_id = atoi(argv[2]);
+
+xc = xc_interface_open(0, 0, 0);
+
+fmem = xenforeignmemory_open(0, 0);
+
+if (!xc) {
+fprintf(stderr, "Failed to open xc interface\n");
+return 1;
+}
+
+rc = xc_vmtrace_pt_enable(xc, domid, vcpu_id);
+
+if (rc) {
+fprintf(stderr, "Failed to call xc_vmtrace_pt_enable\n");
+return 1;
+}
+
+rc = xc_vmtrace_pt_get_offset(xc, domid, vcpu_id, NULL, );
+
+if (rc) {
+fprintf(stderr, "Failed to get trace buffer size\n");
+return 1;
+}
+
+fres = xenforeignmemory_map_resource(
+fmem, domid, XENMEM_resource_vmtrace_buf,
+/* vcpu: */ vcpu_id,
+/* frame: */ 0,
+/* num_frames: */ size >> XC_PAGE_SHIFT,
+(void **),
+PROT_READ, 0);
+
+if (!buf) {
+fprintf(stderr, "Failed to map trace buffer\n");
+return 1;

[PATCH v6 06/11] x86/hvm: processor trace interface in HVM

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Implement necessary changes in common code/HVM to support
processor trace features. Define vmtrace_pt_* API and
implement trace buffer allocation/deallocation in common
code.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/domain.c | 21 +
 xen/common/domain.c   | 35 +++
 xen/include/asm-x86/hvm/hvm.h | 20 
 xen/include/xen/sched.h   |  4 
 4 files changed, 80 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index b75017b28b..8ce2ab6b8f 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2205,6 +2205,27 @@ int domain_relinquish_resources(struct domain *d)
 altp2m_vcpu_disable_ve(v);
 }
 
+for_each_vcpu ( d, v )
+{
+unsigned int i;
+uint64_t nr_pages = v->domain->processor_trace_buf_kb * KB(1);
+nr_pages >>= PAGE_SHIFT;
+
+if ( !v->vmtrace.pt_buf )
+continue;
+
+for ( i = 0; i < nr_pages; i++ )
+{
+struct page_info *pg = mfn_to_page(
+mfn_add(page_to_mfn(v->vmtrace.pt_buf), i));
+
+put_page_alloc_ref(pg);
+put_page_and_type(pg);
+}
+
+v->vmtrace.pt_buf = NULL;
+}
+
 if ( is_pv_domain(d) )
 {
 for_each_vcpu ( d, v )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index e6e8f88da1..193099a2ab 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -137,6 +137,38 @@ static void vcpu_destroy(struct vcpu *v)
 free_vcpu_struct(v);
 }
 
+static int vmtrace_alloc_buffers(struct vcpu *v)
+{
+unsigned int i;
+struct page_info *pg;
+uint64_t size = v->domain->processor_trace_buf_kb * KB(1);
+
+pg = alloc_domheap_pages(v->domain, get_order_from_bytes(size),
+ MEMF_no_refcount);
+
+if ( !pg )
+return -ENOMEM;
+
+for ( i = 0; i < (size >> PAGE_SHIFT); i++ )
+{
+struct page_info *pg_iter = mfn_to_page(
+mfn_add(page_to_mfn(pg), i));
+
+if ( !get_page_and_type(pg_iter, v->domain, PGT_writable_page) )
+{
+/*
+ * The domain can't possibly know about this page yet, so failure
+ * here is a clear indication of something fishy going on.
+ */
+domain_crash(v->domain);
+return -ENODATA;
+}
+}
+
+v->vmtrace.pt_buf = pg;
+return 0;
+}
+
 struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
 {
 struct vcpu *v;
@@ -162,6 +194,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int 
vcpu_id)
 v->vcpu_id = vcpu_id;
 v->dirty_cpu = VCPU_CPU_CLEAN;
 
+if ( d->processor_trace_buf_kb && vmtrace_alloc_buffers(v) != 0 )
+return NULL;
+
 spin_lock_init(>virq_lock);
 
 tasklet_init(>continue_hypercall_tasklet, NULL, NULL);
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1eb377dd82..476a216205 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -214,6 +214,10 @@ struct hvm_function_table {
 bool_t (*altp2m_vcpu_emulate_ve)(struct vcpu *v);
 int (*altp2m_vcpu_emulate_vmfunc)(const struct cpu_user_regs *regs);
 
+/* vmtrace */
+int (*vmtrace_control_pt)(struct vcpu *v, bool enable);
+int (*vmtrace_get_pt_offset)(struct vcpu *v, uint64_t *offset, uint64_t 
*size);
+
 /*
  * Parameters and callbacks for hardware-assisted TSC scaling,
  * which are valid only when the hardware feature is available.
@@ -655,6 +659,22 @@ static inline bool altp2m_vcpu_emulate_ve(struct vcpu *v)
 return false;
 }
 
+static inline int vmtrace_control_pt(struct vcpu *v, bool enable)
+{
+if ( hvm_funcs.vmtrace_control_pt )
+return hvm_funcs.vmtrace_control_pt(v, enable);
+
+return -EOPNOTSUPP;
+}
+
+static inline int vmtrace_get_pt_offset(struct vcpu *v, uint64_t *offset, 
uint64_t *size)
+{
+if ( hvm_funcs.vmtrace_get_pt_offset )
+return hvm_funcs.vmtrace_get_pt_offset(v, offset, size);
+
+return -EOPNOTSUPP;
+}
+
 /*
  * This must be defined as a macro instead of an inline function,
  * because it uses 'struct vcpu' and 'struct domain' which have
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c046e59886..b6f39233aa 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -253,6 +253,10 @@ struct vcpu
 /* vPCI per-vCPU area, used to store data for long running operations. */
 struct vpci_vcpu vpci;
 
+struct {
+struct page_info *pt_buf;
+} vmtrace;
+
 struct arch_vcpu arch;
 };
 
-- 
2.17.1




[PATCH v6 05/11] tools/libxl: add vmtrace_pt_size parameter

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Allow to specify the size of per-vCPU trace buffer upon
domain creation. This is zero by default (meaning: not enabled).

Signed-off-by: Michal Leszczynski 
---
 docs/man/xl.cfg.5.pod.in | 13 +
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libxl/libxl.h  |  8 
 tools/libxl/libxl_create.c   |  1 +
 tools/libxl/libxl_types.idl  |  4 
 tools/xl/xl_parse.c  | 22 ++
 7 files changed, 51 insertions(+)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 0532739c1f..ddef9b6014 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -683,6 +683,19 @@ If this option is not specified then it will default to 
B.
 
 =back
 
+=item B
+
+Specifies the size of processor trace buffer that would be allocated
+for each vCPU belonging to this domain. Disabled (i.e.
+B by default. This must be set to
+non-zero value in order to be able to use processor tracing features
+with this domain.
+
+B: In order to use Intel Processor Trace feature, this value
+must be between 8 kB and 4 GB and it must be a power of 2.
+
+=back
+
 =head2 Devices
 
 The following options define the paravirtual, emulated and physical
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 152c7e8e6b..3ce6f2374b 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1117,6 +1117,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.Altp2M = Altp2MMode(xc.altp2m)
+x.ProcessorTraceBufKb = int(xc.processor_trace_buf_kb)
 
  return nil}
 
@@ -1592,6 +1593,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.altp2m = C.libxl_altp2m_mode(x.Altp2M)
+xc.processor_trace_buf_kb = C.int(x.ProcessorTraceBufKb)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 663c1e86b4..f4bc16c0fd 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -516,6 +516,7 @@ GicVersion GicVersion
 Vuart VuartType
 }
 Altp2M Altp2MMode
+ProcessorTraceBufKb int
 }
 
 type domainBuildInfoTypeUnion interface {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 1cd6c38e83..fbf222967a 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -438,6 +438,14 @@
  */
 #define LIBXL_HAVE_CREATEINFO_PASSTHROUGH 1
 
+/*
+ * LIBXL_HAVE_PROCESSOR_TRACE_BUF_KB indicates that
+ * libxl_domain_create_info has a processor_trace_buf_kb parameter, which
+ * allows to enable pre-allocation of processor tracing buffers of given
+ * size.
+ */
+#define LIBXL_HAVE_PROCESSOR_TRACE_BUF_KB 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 2814818e34..4d6318124a 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -608,6 +608,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config 
*d_config,
 .max_evtchn_port = b_info->event_channels,
 .max_grant_frames = b_info->max_grant_frames,
 .max_maptrack_frames = b_info->max_maptrack_frames,
+.processor_trace_buf_kb = b_info->processor_trace_buf_kb,
 };
 
 if (info->type != LIBXL_DOMAIN_TYPE_PV) {
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9d3f05f399..748fde65ab 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -645,6 +645,10 @@ libxl_domain_build_info = Struct("domain_build_info",[
 # supported by x86 HVM and ARM support is planned.
 ("altp2m", libxl_altp2m_mode),
 
+# Size of preallocated processor trace buffers (in KBYTES).
+# Use zero value to disable this feature.
+("processor_trace_buf_kb", integer),
+
 ], dir=DIR_IN,
copy_deprecated_fn="libxl__domain_build_info_copy_deprecated",
 )
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 61b4ef7b7e..87e373b413 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1861,6 +1861,28 @@ void parse_config_data(const char *config_source,
 }
 }
 
+if (!xlu_cfg_get_long(config, "processor_trace_buf_kb", , 1) && l) {
+if (l & (l - 1)) {
+fprintf(stderr, "ERROR: processor_trace_buf_kb"
+" - must be a power of 2\n");
+exit(1);
+}
+
+if (l < 8) {
+fprintf(stderr, "ERROR: processor_trace_buf_kb"
+" - value is too small\n");
+exit(1);
+}
+
+if (l > 1024*1024*4) {
+fprintf(stderr, "ERROR: processor_trace_buf_kb"
+" - value is 

[PATCH v6 08/11] x86/mm: add vmtrace_buf resource type

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Allow to map processor trace buffer using
acquire_resource().

Signed-off-by: Michal Leszczynski 
---
 xen/common/memory.c | 31 +++
 xen/include/public/memory.h |  1 +
 2 files changed, 32 insertions(+)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index eb42f883df..c0a22eb60f 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1007,6 +1007,32 @@ static long xatp_permission_check(struct domain *d, 
unsigned int space)
 return xsm_add_to_physmap(XSM_TARGET, current->domain, d);
 }
 
+static int acquire_vmtrace_buf(struct domain *d, unsigned int id,
+   uint64_t frame,
+   uint64_t nr_frames,
+   xen_pfn_t mfn_list[])
+{
+mfn_t mfn;
+unsigned int i;
+uint64_t size;
+struct vcpu *v = domain_vcpu(d, id);
+
+if ( !v || !v->vmtrace.pt_buf )
+return -EINVAL;
+
+mfn = page_to_mfn(v->vmtrace.pt_buf);
+size = v->domain->processor_trace_buf_kb * KB(1);
+
+if ( (frame > (size >> PAGE_SHIFT)) ||
+ (nr_frames > ((size >> PAGE_SHIFT) - frame)) )
+return -EINVAL;
+
+for ( i = 0; i < nr_frames; i++ )
+mfn_list[i] = mfn_x(mfn_add(mfn, frame + i));
+
+return 0;
+}
+
 static int acquire_grant_table(struct domain *d, unsigned int id,
unsigned long frame,
unsigned int nr_frames,
@@ -1117,6 +1143,11 @@ static int acquire_resource(
  mfn_list);
 break;
 
+case XENMEM_resource_vmtrace_buf:
+rc = acquire_vmtrace_buf(d, xmar.id, xmar.frame, xmar.nr_frames,
+ mfn_list);
+break;
+
 default:
 rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame,
xmar.nr_frames, mfn_list);
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 21057ed78e..f4c905a10e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -625,6 +625,7 @@ struct xen_mem_acquire_resource {
 
 #define XENMEM_resource_ioreq_server 0
 #define XENMEM_resource_grant_table 1
+#define XENMEM_resource_vmtrace_buf 2
 
 /*
  * IN - a type-specific resource identifier, which must be zero
-- 
2.17.1




[PATCH v6 10/11] tools/libxc: add xc_vmtrace_* functions

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Add functions in libxc that use the new XEN_DOMCTL_vmtrace interface.

Signed-off-by: Michal Leszczynski 
---
 tools/libxc/Makefile  |  1 +
 tools/libxc/include/xenctrl.h | 40 
 tools/libxc/xc_vmtrace.c  | 87 +++
 3 files changed, 128 insertions(+)
 create mode 100644 tools/libxc/xc_vmtrace.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index fae5969a73..605e44501d 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -27,6 +27,7 @@ CTRL_SRCS-y   += xc_csched2.c
 CTRL_SRCS-y   += xc_arinc653.c
 CTRL_SRCS-y   += xc_rt.c
 CTRL_SRCS-y   += xc_tbuf.c
+CTRL_SRCS-y   += xc_vmtrace.c
 CTRL_SRCS-y   += xc_pm.c
 CTRL_SRCS-y   += xc_cpu_hotplug.c
 CTRL_SRCS-y   += xc_resume.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 4c89b7294c..491b2c3236 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1585,6 +1585,46 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t 
mask);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
+/**
+ * Enable processor trace for given vCPU in given DomU.
+ * Allocate the trace ringbuffer with given size.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_enable(xc_interface *xch, uint32_t domid,
+ uint32_t vcpu);
+
+/**
+ * Disable processor trace for given vCPU in given DomU.
+ * Deallocate the trace ringbuffer.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_disable(xc_interface *xch, uint32_t domid,
+  uint32_t vcpu);
+
+/**
+ * Get current offset inside the trace ringbuffer.
+ * This allows to determine how much data was written into the buffer.
+ * Once buffer overflows, the offset will reset to 0 and the previous
+ * data will be overriden.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm offset current offset inside trace buffer will be written there
+ * @parm size the total size of the trace buffer (in bytes)
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_get_offset(xc_interface *xch, uint32_t domid,
+ uint32_t vcpu, uint64_t *offset, uint64_t *size);
+
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
diff --git a/tools/libxc/xc_vmtrace.c b/tools/libxc/xc_vmtrace.c
new file mode 100644
index 00..ee034da8d3
--- /dev/null
+++ b/tools/libxc/xc_vmtrace.c
@@ -0,0 +1,87 @@
+/**
+ * xc_vmtrace.c
+ *
+ * API for manipulating hardware tracing features
+ *
+ * Copyright (c) 2020, Michal Leszczynski
+ *
+ * Copyright 2020 CERT Polska. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see .
+ */
+
+#include "xc_private.h"
+#include 
+
+int xc_vmtrace_pt_enable(
+xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+DECLARE_DOMCTL;
+int rc;
+
+domctl.cmd = XEN_DOMCTL_vmtrace_op;
+domctl.domain = domid;
+domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_enable;
+domctl.u.vmtrace_op.vcpu = vcpu;
+domctl.u.vmtrace_op.pad1 = 0;
+domctl.u.vmtrace_op.pad2 = 0;
+
+rc = do_domctl(xch, );
+return rc;
+}
+
+int xc_vmtrace_pt_get_offset(
+xc_interface *xch, uint32_t domid, uint32_t vcpu,
+uint64_t *offset, uint64_t *size)
+{
+DECLARE_DOMCTL;
+int rc;
+
+domctl.cmd = XEN_DOMCTL_vmtrace_op;
+domctl.domain = domid;
+domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_get_offset;
+domctl.u.vmtrace_op.vcpu = vcpu;
+domctl.u.vmtrace_op.pad1 = 0;
+domctl.u.vmtrace_op.pad2 = 0;
+
+rc = do_domctl(xch, );
+if ( !rc )
+{
+if (offset)
+*offset = domctl.u.vmtrace_op.offset;
+
+if (size)
+*size = domctl.u.vmtrace_op.size;
+}
+
+return rc;
+}
+
+int xc_vmtrace_pt_disable(xc_interface *xch, uint32_t domid, uint32_t 

[PATCH v6 07/11] x86/vmx: implement IPT in VMX

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Use Intel Processor Trace feature to provide vmtrace_pt_*
interface for HVM/VMX.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/vmx/vmx.c | 110 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   3 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |  14 
 3 files changed, 127 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index cc6d4ece22..63a5a76e16 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -428,6 +428,56 @@ static void vmx_domain_relinquish_resources(struct domain 
*d)
 vmx_free_vlapic_mapping(d);
 }
 
+static int vmx_init_pt(struct vcpu *v)
+{
+int rc;
+uint64_t size = v->domain->processor_trace_buf_kb * KB(1);
+
+if ( !v->vmtrace.pt_buf || !size )
+return -EINVAL;
+
+/*
+ * We don't accept trace buffer size smaller than single page
+ * and the upper bound is defined as 4GB in the specification.
+ * The buffer size must be also a power of 2.
+ */
+if ( size < PAGE_SIZE || size > GB(4) || (size & (size - 1)) )
+return -EINVAL;
+
+v->arch.hvm.vmx.ipt_state = xzalloc(struct ipt_state);
+
+if ( !v->arch.hvm.vmx.ipt_state )
+return -ENOMEM;
+
+v->arch.hvm.vmx.ipt_state->output_base =
+page_to_maddr(v->vmtrace.pt_buf);
+v->arch.hvm.vmx.ipt_state->output_mask.raw = size - 1;
+
+rc = vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0);
+
+if ( rc )
+return rc;
+
+rc = vmx_add_guest_msr(v, MSR_RTIT_CTL,
+  RTIT_CTL_TRACE_EN | RTIT_CTL_OS |
+  RTIT_CTL_USR | RTIT_CTL_BRANCH_EN);
+
+if ( rc )
+return rc;
+
+return 0;
+}
+
+static int vmx_destroy_pt(struct vcpu* v)
+{
+if ( v->arch.hvm.vmx.ipt_state )
+xfree(v->arch.hvm.vmx.ipt_state);
+
+v->arch.hvm.vmx.ipt_state = NULL;
+return 0;
+}
+
+
 static int vmx_vcpu_initialise(struct vcpu *v)
 {
 int rc;
@@ -471,6 +521,14 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
 vmx_install_vlapic_mapping(v);
 
+if ( v->domain->processor_trace_buf_kb )
+{
+rc = vmx_init_pt(v);
+
+if ( rc )
+return rc;
+}
+
 return 0;
 }
 
@@ -483,6 +541,7 @@ static void vmx_vcpu_destroy(struct vcpu *v)
  * prior to vmx_domain_destroy so we need to disable PML for each vcpu
  * separately here.
  */
+vmx_destroy_pt(v);
 vmx_vcpu_disable_pml(v);
 vmx_destroy_vmcs(v);
 passive_domain_destroy(v);
@@ -513,6 +572,18 @@ static void vmx_save_guest_msrs(struct vcpu *v)
  * be updated at any time via SWAPGS, which we cannot trap.
  */
 v->arch.hvm.vmx.shadow_gs = rdgsshadow();
+
+if ( unlikely(v->arch.hvm.vmx.ipt_state &&
+  v->arch.hvm.vmx.ipt_state->active) )
+{
+uint64_t rtit_ctl;
+rdmsrl(MSR_RTIT_CTL, rtit_ctl);
+BUG_ON(rtit_ctl & RTIT_CTL_TRACE_EN);
+
+rdmsrl(MSR_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
+rdmsrl(MSR_RTIT_OUTPUT_MASK,
+   v->arch.hvm.vmx.ipt_state->output_mask.raw);
+}
 }
 
 static void vmx_restore_guest_msrs(struct vcpu *v)
@@ -524,6 +595,17 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
 
 if ( cpu_has_msr_tsc_aux )
 wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
+
+if ( unlikely(v->arch.hvm.vmx.ipt_state &&
+  v->arch.hvm.vmx.ipt_state->active) )
+{
+wrmsrl(MSR_RTIT_OUTPUT_BASE,
+   v->arch.hvm.vmx.ipt_state->output_base);
+wrmsrl(MSR_RTIT_OUTPUT_MASK,
+   v->arch.hvm.vmx.ipt_state->output_mask.raw);
+wrmsrl(MSR_RTIT_STATUS,
+   v->arch.hvm.vmx.ipt_state->status);
+}
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
@@ -2240,6 +2322,25 @@ static bool vmx_get_pending_event(struct vcpu *v, struct 
x86_event *info)
 return true;
 }
 
+static int vmx_control_pt(struct vcpu *v, bool enable)
+{
+if ( !v->arch.hvm.vmx.ipt_state )
+return -EINVAL;
+
+v->arch.hvm.vmx.ipt_state->active = enable;
+return 0;
+}
+
+static int vmx_get_pt_offset(struct vcpu *v, uint64_t *offset, uint64_t *size)
+{
+if ( !v->arch.hvm.vmx.ipt_state )
+return -EINVAL;
+
+*offset = v->arch.hvm.vmx.ipt_state->output_mask.offset;
+*size = v->arch.hvm.vmx.ipt_state->output_mask.size + 1;
+return 0;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
 .name = "VMX",
 .cpu_up_prepare   = vmx_cpu_up_prepare,
@@ -2295,6 +2396,8 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 .altp2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
 .altp2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
 .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
+.vmtrace_control_pt = vmx_control_pt,
+.vmtrace_get_pt_offset = vmx_get_pt_offset,
 .tsc_scaling = {
 .max_ratio = VMX_TSC_MULTIPLIER_MAX,
 },
@@ 

[PATCH v6 09/11] x86/domctl: add XEN_DOMCTL_vmtrace_op

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Implement domctl to manage the runtime state of
processor trace feature.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/domctl.c   | 50 +
 xen/include/public/domctl.h | 28 +
 2 files changed, 78 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 6f2c69788d..6132499db4 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -322,6 +322,50 @@ void arch_get_domain_info(const struct domain *d,
 info->arch_config.emulation_flags = d->arch.emulation_flags;
 }
 
+static int do_vmtrace_op(struct domain *d, struct xen_domctl_vmtrace_op *op,
+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+{
+int rc;
+struct vcpu *v;
+
+if ( op->pad1 || op->pad2 )
+return -EINVAL;
+
+if ( !vmtrace_supported )
+return -EOPNOTSUPP;
+
+if ( !is_hvm_domain(d) )
+return -EOPNOTSUPP;
+
+if ( op->vcpu >= d->max_vcpus )
+return -EINVAL;
+
+v = domain_vcpu(d, op->vcpu);
+rc = 0;
+
+switch ( op->cmd )
+{
+case XEN_DOMCTL_vmtrace_pt_enable:
+case XEN_DOMCTL_vmtrace_pt_disable:
+vcpu_pause(v);
+rc = vmtrace_control_pt(v, op->cmd == XEN_DOMCTL_vmtrace_pt_enable);
+vcpu_unpause(v);
+break;
+
+case XEN_DOMCTL_vmtrace_pt_get_offset:
+rc = vmtrace_get_pt_offset(v, >offset, >size);
+
+if ( !rc && d->is_dying )
+rc = ENODATA;
+break;
+
+default:
+rc = -EOPNOTSUPP;
+}
+
+return rc;
+}
+
 #define MAX_IOPORTS 0x1
 
 long arch_do_domctl(
@@ -337,6 +381,12 @@ long arch_do_domctl(
 switch ( domctl->cmd )
 {
 
+case XEN_DOMCTL_vmtrace_op:
+ret = do_vmtrace_op(d, >u.vmtrace_op, u_domctl);
+if ( !ret )
+copyback = true;
+break;
+
 case XEN_DOMCTL_shadow_op:
 ret = paging_domctl(d, >u.shadow_op, u_domctl, 0);
 if ( ret == -ERESTART )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 7681675a94..73c7ccbd16 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1136,6 +1136,30 @@ struct xen_domctl_vuart_op {
  */
 };
 
+/* XEN_DOMCTL_vmtrace_op: Perform VM tracing related operation */
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+
+struct xen_domctl_vmtrace_op {
+/* IN variable */
+uint32_t cmd;
+/* Enable/disable external vmtrace for given domain */
+#define XEN_DOMCTL_vmtrace_pt_enable  1
+#define XEN_DOMCTL_vmtrace_pt_disable 2
+#define XEN_DOMCTL_vmtrace_pt_get_offset  3
+domid_t domain;
+uint16_t pad1;
+uint32_t vcpu;
+uint16_t pad2;
+
+/* OUT variable */
+uint64_aligned_t size;
+uint64_aligned_t offset;
+};
+typedef struct xen_domctl_vmtrace_op xen_domctl_vmtrace_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_vmtrace_op_t);
+
+#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
+
 struct xen_domctl {
 uint32_t cmd;
 #define XEN_DOMCTL_createdomain   1
@@ -1217,6 +1241,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_vuart_op  81
 #define XEN_DOMCTL_get_cpu_policy82
 #define XEN_DOMCTL_set_cpu_policy83
+#define XEN_DOMCTL_vmtrace_op84
 #define XEN_DOMCTL_gdbsx_guestmemio1000
 #define XEN_DOMCTL_gdbsx_pausevcpu 1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu   1002
@@ -1277,6 +1302,9 @@ struct xen_domctl {
 struct xen_domctl_monitor_opmonitor_op;
 struct xen_domctl_psr_alloc psr_alloc;
 struct xen_domctl_vuart_op  vuart_op;
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+struct xen_domctl_vmtrace_opvmtrace_op;
+#endif
 uint8_t pad[128];
 } u;
 };
-- 
2.17.1




[PATCH v6 00/11] Implement support for external IPT monitoring

2020-07-07 Thread Michał Leszczyński
Intel Processor Trace is an architectural extension available in modern Intel 
family CPUs. It allows recording the detailed trace of activity while the 
processor executes the code. One might use the recorded trace to reconstruct 
the code flow. It means, to find out the executed code paths, determine 
branches taken, and so forth.

The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures 
Software Developer's Manual Volume 3C: System Programming Guide, Part 3, 
Chapter 36: "Intel Processor Trace."

This patch series implements an interface that Dom0 could use in order to 
enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such 
a feature has numerous applications like malware monitoring, fuzzing, or 
performance testing.

Also thanks to Tamas K Lengyel for a few preliminary hints before
first version of this patch was submitted to xen-devel.

Changed since v1:
  * MSR_RTIT_CTL is managed using MSR load lists
  * other PT-related MSRs are modified only when vCPU goes out of context
  * trace buffer is now acquired as a resource
  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
must be specified in the moment of domain creation
  * trace buffers are allocated on domain creation, destructed on
domain destruction
  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
these calls don't manage buffer memory anymore
  * lifted 32 MFN/GFN array limit when acquiring resources
  * minor code style changes according to review

Changed since v2:
  * trace buffer is now allocated on domain creation (in v2 it was
allocated when hvm param was set)
  * restored 32-item limit in mfn/gfn arrays in acquire_resource
and instead implemented hypercall continuations
  * code changes according to Jan's and Roger's review

Changed since v3:
  * vmtrace HVMOPs are not implemented as DOMCTLs
  * patches splitted up according to Andrew's comments
  * code changes according to v3 review on the mailing list

Changed since v4:
  * rebased to commit be63d9d4
  * fixed dependencies between patches
(earlier patches don't reference further patches)
  * introduced preemption check in acquire_resource
  * moved buffer allocation to common code
  * splitted some patches according to code review
  * minor fixes according to code review

Changed since v5:
  * trace buffer size is now dynamically determined by the proctrace
tool
  * trace buffer size variable is uniformly defined as uint32_t
processor_trace_buf_kb in hypervisor, toolstack and ABI
  * buffer pages are not freed explicitly but reference count is
now used instead
  * minor fixes according to code review

This patch series is available on GitHub:
https://github.com/icedevml/xen/tree/ipt-patch-v6


Michal Leszczynski (11):
  memory: batch processing in acquire_resource()
  x86/vmx: add Intel PT MSR definitions
  x86/vmx: add IPT cpu feature
  common: add vmtrace_pt_size domain parameter
  tools/libxl: add vmtrace_pt_size parameter
  x86/hvm: processor trace interface in HVM
  x86/vmx: implement IPT in VMX
  x86/mm: add vmtrace_buf resource type
  x86/domctl: add XEN_DOMCTL_vmtrace_op
  tools/libxc: add xc_vmtrace_* functions
  tools/proctrace: add proctrace tool

 docs/man/xl.cfg.5.pod.in|  13 ++
 tools/golang/xenlight/helpers.gen.go|   2 +
 tools/golang/xenlight/types.gen.go  |   1 +
 tools/libxc/Makefile|   1 +
 tools/libxc/include/xenctrl.h   |  40 +
 tools/libxc/xc_vmtrace.c|  87 ++
 tools/libxl/libxl.h |   8 +
 tools/libxl/libxl_create.c  |   1 +
 tools/libxl/libxl_types.idl |   4 +
 tools/proctrace/Makefile|  45 +
 tools/proctrace/proctrace.c | 179 
 tools/xl/xl_parse.c |  22 +++
 xen/arch/x86/domain.c   |  27 +++
 xen/arch/x86/domctl.c   |  50 ++
 xen/arch/x86/hvm/vmx/vmcs.c |  15 +-
 xen/arch/x86/hvm/vmx/vmx.c  | 110 
 xen/common/domain.c |  46 +
 xen/common/memory.c |  80 -
 xen/include/asm-x86/cpufeature.h|   1 +
 xen/include/asm-x86/hvm/hvm.h   |  20 +++
 xen/include/asm-x86/hvm/vmx/vmcs.h  |   4 +
 xen/include/asm-x86/hvm/vmx/vmx.h   |  14 ++
 xen/include/asm-x86/msr-index.h |  24 +++
 xen/include/public/arch-x86/cpufeatureset.h |   1 +
 xen/include/public/domctl.h |  29 
 xen/include/public/memory.h |   1 +
 xen/include/xen/domain.h|   2 +
 xen/include/xen/sched.h |   7 +
 28 files changed, 828 insertions(+), 6 deletions(-)
 create mode 100644 tools/libxc/xc_vmtrace.c
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 

[PATCH v6 04/11] common: add vmtrace_pt_size domain parameter

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Add vmtrace_pt_size domain parameter in live domain and
vmtrace_pt_order parameter in xen_domctl_createdomain.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/domain.c   | 6 ++
 xen/common/domain.c | 9 +
 xen/include/public/domctl.h | 1 +
 xen/include/xen/sched.h | 3 +++
 4 files changed, 19 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fee6c3931a..b75017b28b 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -499,6 +499,12 @@ int arch_sanitise_domain_config(struct 
xen_domctl_createdomain *config)
  */
 config->flags |= XEN_DOMCTL_CDF_oos_off;
 
+if ( !hvm && config->processor_trace_buf_kb )
+{
+dprintk(XENLOG_INFO, "Processor trace is not supported on non-HVM\n");
+return -EINVAL;
+}
+
 return 0;
 }
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index a45cf023f7..e6e8f88da1 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -338,6 +338,12 @@ static int sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 return -EINVAL;
 }
 
+if ( config->processor_trace_buf_kb && !vmtrace_supported )
+{
+dprintk(XENLOG_INFO, "Processor tracing is not supported\n");
+return -EINVAL;
+}
+
 return arch_sanitise_domain_config(config);
 }
 
@@ -443,6 +449,9 @@ struct domain *domain_create(domid_t domid,
 d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
 
 radix_tree_init(>pirq_tree);
+
+if ( config->processor_trace_buf_kb )
+d->processor_trace_buf_kb = config->processor_trace_buf_kb;
 }
 
 if ( (err = arch_domain_create(d, config)) != 0 )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 59bdc28c89..7681675a94 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
 uint32_t max_evtchn_port;
 int32_t max_grant_frames;
 int32_t max_maptrack_frames;
+uint32_t processor_trace_buf_kb;
 
 struct xen_arch_domainconfig arch;
 };
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..c046e59886 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -457,6 +457,9 @@ struct domain
 unsignedpbuf_idx;
 spinlock_t  pbuf_lock;
 
+/* Used by vmtrace features */
+uint32_tprocessor_trace_buf_kb;
+
 /* OProfile support. */
 struct xenoprof *xenoprof;
 
-- 
2.17.1




[PATCH v6 01/11] memory: batch processing in acquire_resource()

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Allow to acquire large resources by allowing acquire_resource()
to process items in batches, using hypercall continuation.

Be aware that this modifies the behavior of acquire_resource
call with frame_list=NULL. While previously it would return
the size of internal array (32), with this patch it returns
the maximal quantity of frames that could be requested at once,
i.e. UINT_MAX >> MEMOP_EXTENT_SHIFT.

Signed-off-by: Michal Leszczynski 
---
 xen/common/memory.c | 49 -
 1 file changed, 44 insertions(+), 5 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..eb42f883df 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, 
unsigned int id,
 }
 
 static int acquire_resource(
-XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
+XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
+unsigned long *start_extent)
 {
 struct domain *d, *currd = current->domain;
 xen_mem_acquire_resource_t xmar;
+uint32_t total_frames;
 /*
  * The mfn_list and gfn_list (below) arrays are ok on stack for the
  * moment since they are small, but if they need to grow in future
@@ -1069,7 +1071,7 @@ static int acquire_resource(
 if ( xmar.nr_frames )
 return -EINVAL;
 
-xmar.nr_frames = ARRAY_SIZE(mfn_list);
+xmar.nr_frames = UINT_MAX >> MEMOP_EXTENT_SHIFT;
 
 if ( __copy_field_to_guest(arg, , nr_frames) )
 return -EFAULT;
@@ -1077,8 +1079,28 @@ static int acquire_resource(
 return 0;
 }
 
+total_frames = xmar.nr_frames;
+
+/* Is the size too large for us to encode a continuation? */
+if ( unlikely(xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT)) )
+return -EINVAL;
+
+if ( *start_extent )
+{
+/*
+ * Check whether start_extent is in bounds, as this
+ * value if visible to the calling domain.
+ */
+if ( *start_extent > xmar.nr_frames )
+return -EINVAL;
+
+xmar.frame += *start_extent;
+xmar.nr_frames -= *start_extent;
+guest_handle_add_offset(xmar.frame_list, *start_extent);
+}
+
 if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
-return -E2BIG;
+xmar.nr_frames = ARRAY_SIZE(mfn_list);
 
 rc = rcu_lock_remote_domain_by_id(xmar.domid, );
 if ( rc )
@@ -1135,6 +1157,14 @@ static int acquire_resource(
 }
 }
 
+if ( !rc )
+{
+*start_extent += xmar.nr_frames;
+
+if ( *start_extent != total_frames )
+rc = -ERESTART;
+}
+
  out:
 rcu_unlock_domain(d);
 
@@ -1599,8 +1629,17 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 #endif
 
 case XENMEM_acquire_resource:
-rc = acquire_resource(
-guest_handle_cast(arg, xen_mem_acquire_resource_t));
+do {
+rc = acquire_resource(
+guest_handle_cast(arg, xen_mem_acquire_resource_t),
+_extent);
+
+if ( hypercall_preempt_check() )
+return hypercall_create_continuation(
+__HYPERVISOR_memory_op, "lh",
+op | (start_extent << MEMOP_EXTENT_SHIFT), arg);
+} while ( rc == -ERESTART );
+
 break;
 
 default:
-- 
2.17.1




[PATCH v6 03/11] x86/vmx: add IPT cpu feature

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Check if Intel Processor Trace feature is supported by current
processor. Define vmtrace_supported global variable.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/vmx/vmcs.c | 15 ++-
 xen/common/domain.c |  2 ++
 xen/include/asm-x86/cpufeature.h|  1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |  1 +
 xen/include/public/arch-x86/cpufeatureset.h |  1 +
 xen/include/xen/domain.h|  2 ++
 6 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index ca94c2bedc..3a53553f10 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -291,6 +291,20 @@ static int vmx_init_vmcs_config(void)
 _vmx_cpu_based_exec_control &=
 ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
 
+rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
+
+/* Check whether IPT is supported in VMX operation. */
+if ( !smp_processor_id() )
+vmtrace_supported = cpu_has_ipt &&
+(_vmx_misc_cap & VMX_MISC_PROC_TRACE);
+else if ( vmtrace_supported &&
+  !(_vmx_misc_cap & VMX_MISC_PROC_TRACE) )
+{
+printk("VMX: IPT capabilities fatally differ between CPU%u and CPU0\n",
+   smp_processor_id());
+return -EINVAL;
+}
+
 if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
 {
 min = 0;
@@ -305,7 +319,6 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
SECONDARY_EXEC_XSAVES |
SECONDARY_EXEC_TSC_SCALING);
-rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
 if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
 if ( opt_vpid_enabled )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7cc9526139..a45cf023f7 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
 
 vcpu_info_t dummy_vcpu_info;
 
+bool vmtrace_supported __read_mostly;
+
 static void __domain_finalise_shutdown(struct domain *d)
 {
 struct vcpu *v;
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f790d5c1f8..555f696a26 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -104,6 +104,7 @@
 #define cpu_has_clwbboot_cpu_has(X86_FEATURE_CLWB)
 #define cpu_has_avx512erboot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_avx512cdboot_cpu_has(X86_FEATURE_AVX512CD)
+#define cpu_has_ipt boot_cpu_has(X86_FEATURE_PROC_TRACE)
 #define cpu_has_sha boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bwboot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vlboot_cpu_has(X86_FEATURE_AVX512VL)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 906810592f..6153ba6769 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -283,6 +283,7 @@ extern u32 vmx_secondary_exec_control;
 #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x800ULL
 extern u64 vmx_ept_vpid_cap;
 
+#define VMX_MISC_PROC_TRACE 0x4000
 #define VMX_MISC_CR3_TARGET 0x01ff
 #define VMX_MISC_VMWRITE_ALL0x2000
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index fe7492a225..2c91862f2d 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -217,6 +217,7 @@ XEN_CPUFEATURE(SMAP,  5*32+20) /*S  Supervisor Mode 
Access Prevention */
 XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add 
*/
 XEN_CPUFEATURE(CLFLUSHOPT,5*32+23) /*A  CLFLUSHOPT instruction */
 XEN_CPUFEATURE(CLWB,  5*32+24) /*A  CLWB instruction */
+XEN_CPUFEATURE(PROC_TRACE,5*32+25) /*   Processor Tracing feature */
 XEN_CPUFEATURE(AVX512PF,  5*32+26) /*A  AVX-512 Prefetch Instructions */
 XEN_CPUFEATURE(AVX512ER,  5*32+27) /*A  AVX-512 Exponent & Reciprocal 
Instrs */
 XEN_CPUFEATURE(AVX512CD,  5*32+28) /*A  AVX-512 Conflict Detection Instrs 
*/
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 7e51d361de..61ebc6c24d 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -130,4 +130,6 @@ struct vnuma_info {
 
 void vnuma_destroy(struct vnuma_info *vnuma);
 
+extern bool vmtrace_supported;
+
 #endif /* __XEN_DOMAIN_H__ */
-- 
2.17.1




[PATCH v6 02/11] x86/vmx: add Intel PT MSR definitions

2020-07-07 Thread Michał Leszczyński
From: Michal Leszczynski 

Define constants related to Intel Processor Trace features.

Signed-off-by: Michal Leszczynski 
Acked-by: Andrew Cooper 
---
 xen/include/asm-x86/msr-index.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 0fe98af923..4fd54fb5c9 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -72,7 +72,31 @@
 #define MSR_RTIT_OUTPUT_BASE0x0560
 #define MSR_RTIT_OUTPUT_MASK0x0561
 #define MSR_RTIT_CTL0x0570
+#define  RTIT_CTL_TRACE_EN  (_AC(1, ULL) <<  0)
+#define  RTIT_CTL_CYC_EN(_AC(1, ULL) <<  1)
+#define  RTIT_CTL_OS(_AC(1, ULL) <<  2)
+#define  RTIT_CTL_USR   (_AC(1, ULL) <<  3)
+#define  RTIT_CTL_PWR_EVT_EN(_AC(1, ULL) <<  4)
+#define  RTIT_CTL_FUP_ON_PTW(_AC(1, ULL) <<  5)
+#define  RTIT_CTL_FABRIC_EN (_AC(1, ULL) <<  6)
+#define  RTIT_CTL_CR3_FILTER(_AC(1, ULL) <<  7)
+#define  RTIT_CTL_TOPA  (_AC(1, ULL) <<  8)
+#define  RTIT_CTL_MTC_EN(_AC(1, ULL) <<  9)
+#define  RTIT_CTL_TSC_EN(_AC(1, ULL) << 10)
+#define  RTIT_CTL_DIS_RETC  (_AC(1, ULL) << 11)
+#define  RTIT_CTL_PTW_EN(_AC(1, ULL) << 12)
+#define  RTIT_CTL_BRANCH_EN (_AC(1, ULL) << 13)
+#define  RTIT_CTL_MTC_FREQ  (_AC(0xf, ULL) << 14)
+#define  RTIT_CTL_CYC_THRESH(_AC(0xf, ULL) << 19)
+#define  RTIT_CTL_PSB_FREQ  (_AC(0xf, ULL) << 24)
+#define  RTIT_CTL_ADDR(n)   (_AC(0xf, ULL) << (32 + 4 * (n)))
 #define MSR_RTIT_STATUS 0x0571
+#define  RTIT_STATUS_FILTER_EN  (_AC(1, ULL) <<  0)
+#define  RTIT_STATUS_CONTEXT_EN (_AC(1, ULL) <<  1)
+#define  RTIT_STATUS_TRIGGER_EN (_AC(1, ULL) <<  2)
+#define  RTIT_STATUS_ERROR  (_AC(1, ULL) <<  4)
+#define  RTIT_STATUS_STOPPED(_AC(1, ULL) <<  5)
+#define  RTIT_STATUS_BYTECNT(_AC(0x1, ULL) << 32)
 #define MSR_RTIT_CR3_MATCH  0x0572
 #define MSR_RTIT_ADDR_A(n) (0x0580 + (n) * 2)
 #define MSR_RTIT_ADDR_B(n) (0x0581 + (n) * 2)
-- 
2.17.1




Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter

2020-07-07 Thread Michał Leszczyński
- 7 lip 2020 o 13:21, Jan Beulich jbeul...@suse.com napisał(a):

> On 07.07.2020 13:17, Michał Leszczyński wrote:
>> So would it be OK to use uint32_t everywhere and to store the trace buffer
>> size as number of kB? I think this is the most straightforward option.
>> 
>> I would also stick with the name "processor_trace_buf_size"
>> everywhere, both in the hypervisor, ABI and the toolstack, with the
>> respective comments that the size is in kB.
> 
> Perhaps even more clearly "processor_trace_buf_kb" then?
> 
> Jan


Ok.

Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter

2020-07-07 Thread Michał Leszczyński
- 7 lip 2020 o 11:16, Julien Grall jul...@xen.org napisał(a):

> On 07/07/2020 10:10, Jan Beulich wrote:
>> On 07.07.2020 10:44, Julien Grall wrote:
>>> Hi,
>>>
>>> On 06/07/2020 09:46, Jan Beulich wrote:
>>>> On 04.07.2020 19:23, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 03/07/2020 11:11, Roger Pau Monné wrote:
>>>>>> On Fri, Jul 03, 2020 at 11:56:38AM +0200, Jan Beulich wrote:
>>>>>>> On 03.07.2020 11:44, Roger Pau Monné wrote:
>>>>>>>> On Thu, Jul 02, 2020 at 06:23:28PM +0200, Michał Leszczyński wrote:
>>>>>>>>> In previous versions it was "size" but it was requested to change it
>>>>>>>>> to "order" in order to shrink the variable size from uint64_t to
>>>>>>>>> uint8_t, because there is limited space for xen_domctl_createdomain
>>>>>>>>> structure.
>>>>>>>>
>>>>>>>> It's likely I'm missing something here, but I wasn't aware
>>>>>>>> xen_domctl_createdomain had any constrains regarding it's size. It's
>>>>>>>> currently 48bytes which seems fairly small.
>>>>>>>
>>>>>>> Additionally I would guess a uint32_t could do here, if the value
>>>>>>> passed was "number of pages" rather than "number of bytes"?
>>>>> Looking at the rest of the code, the toolstack accepts a 64-bit value.
>>>>> So this would lead to truncation of the buffer if it is bigger than 2^44
>>>>> bytes.
>>>>>
>>>>> I agree such buffer is unlikely, yet I still think we want to harden the
>>>>> code whenever we can. So the solution is to either prevent check
>>>>> truncation in libxl or directly use 64-bit in the domctl.
>>>>>
>>>>> My preference is the latter.
>>>>>
>>>>>>
>>>>>> That could work, not sure if it needs to state however that those will
>>>>>> be 4K pages, since Arm can have a different minimum page size IIRC?
>>>>>> (or that's already the assumption for all number of frames fields)
>>>>>> vmtrace_nr_frames seems fine to me.
>>>>>
>>>>> The hypercalls interface is using the same page granularity as the
>>>>> hypervisor (i.e 4KB).
>>>>>
>>>>> While we already support guest using 64KB page granularity, it is
>>>>> impossible to have a 64KB Arm hypervisor in the current state. You are
>>>>> going to either break existing guest (if you switch to 64KB page
>>>>> granularity for the hypercall ABI) or render them insecure (the mimimum
>>>>> mapping in the P2M would be 64KB).
>>>>>
>>>>> DOMCTLs are not stable yet, so using a number of pages is OK. However, I
>>>>> would strongly suggest to use a number of bytes for any xl/libxl/stable
>>>>> libraries interfaces as this avoids confusion and also make more
>>>>> futureproof.
>>>>
>>>> If we can't settle on what "page size" means in the public interface
>>>> (which imo is embarrassing), then how about going with number of kb,
>>>> like other memory libxl controls do? (I guess using Mb, in line with
>>>> other config file controls, may end up being too coarse here.) This
>>>> would likely still allow for a 32-bit field to be wide enough.
>>>
>>> A 32-bit field would definitely not be able to cover a full address
>>> space. So do you mind to explain what is the upper bound you expect here?
>> 
>> Do you foresee a need for buffer sizes of 4Tb and up?
> 
> Not I am aware of... However, I think the question was worth it given
> that "wide enough" can mean anything.
> 
> Cheers,
> 
> --
> Julien Grall


So would it be OK to use uint32_t everywhere and to store the trace buffer
size as number of kB? I think this is the most straightforward option.

I would also stick with the name "processor_trace_buf_size"
everywhere, both in the hypervisor, ABI and the toolstack, with the
respective comments that the size is in kB.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v5 06/11] x86/hvm: processor trace interface in HVM

2020-07-06 Thread Michał Leszczyński
- 6 lip 2020 o 10:31, Jan Beulich jbeul...@suse.com napisał(a):

> On 05.07.2020 21:11, Michał Leszczyński wrote:
>> - 5 lip 2020 o 20:54, Michał Leszczyński michal.leszczyn...@cert.pl
>> napisał(a):
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -2199,6 +2199,25 @@ int domain_relinquish_resources(struct domain *d)
>>> altp2m_vcpu_disable_ve(v);
>>> }
>>>
>>> +for_each_vcpu ( d, v )
>>> +{
>>> +unsigned int i;
>>> +
>>> +if ( !v->vmtrace.pt_buf )
>>> +continue;
>>> +
>>> +for ( i = 0; i < (v->domain->vmtrace_pt_size >> PAGE_SHIFT); 
>>> i++ )
>>> +{
>>> +struct page_info *pg = mfn_to_page(
>>> +mfn_add(page_to_mfn(v->vmtrace.pt_buf), i));
>>> +if ( (pg->count_info & PGC_count_mask) != 1 )
>>> +return -EBUSY;
>>> +}
>>> +
>>> +free_domheap_pages(v->vmtrace.pt_buf,
>>> +get_order_from_bytes(v->domain->vmtrace_pt_size));
>> 
>> 
>> While this works, I don't feel that this is a good solution with this loop
>> returning -EBUSY here. I would like to kindly ask for suggestions regarding
>> this topic.
> 
> I'm sorry to ask, but with the previously give suggestions to mirror
> existing code, why do you still need to play with this function? You
> really shouldn't have a need to, just like e.g. the ioreq server page
> handling code didn't.
> 
> Jan


Ok, sorry. I think I've finally got it after latest Roger's suggestions :P

This will be fixed in the next version.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v5 11/11] tools/proctrace: add proctrace tool

2020-07-06 Thread Michał Leszczyński
- 6 lip 2020 o 11:47, Andrew Cooper andrew.coop...@citrix.com napisał(a):

> On 06/07/2020 09:33, Jan Beulich wrote:
>> On 05.07.2020 20:58, Michał Leszczyński wrote:
>>> - 5 lip 2020 o 20:55, Michał Leszczyński michal.leszczyn...@cert.pl
>>> napisał(a):
>>>> --- /dev/null
>>>> +++ b/tools/proctrace/proctrace.c
>>>> +#include 
>>>> +#include 
>>>> +#include 
>>>> +#include 
>>>> +
>>>> +#include 
>>>> +#include 
>>>> +#include 
>>>> +
>>>> +#define BUF_SIZE (16384 * XC_PAGE_SIZE)
>>> I would like to discuss here, how we should retrieve the trace buffer size
>>> in runtime? Should there be a hypercall for it, or some extension to
>>> acquire_resource logic?
>> Personally I'd prefer the latter, but the question is whether one can
>> be made in a backwards compatible way.
> 
> I already covered this in v4.
> 
> ~Andrew


Ok, sorry, I see:

> The guest_handle_is_null(xmar.frame_list) path
> in Xen is supposed to report the size of the resource, not the size of
> Xen's local buffer, so userspace can ask "how large is this resource".
> 
> I'll try and find some time to fix this and arrange for backports, but
> the current behaviour is nonsense, and problematic for new users.

So to make it clear: should I modify the acquire_resource logic
in such way that NULL guest handle would report the actual
size of the resource?

If I got it right, here:

https://lists.xen.org/archives/html/xen-devel/2020-07/msg00159.html

it was suggested that it should report the constant value of
UINT_MAX >> MEMOP_EXTENT_SHIFT and as far as I understood, the expectation
is that it would report how many frames might be requested at once,
not what is the size of the resource we're asking for.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v5 04/11] common: add vmtrace_pt_size domain parameter

2020-07-06 Thread Michał Leszczyński
- 5 lip 2020 o 20:54, Michał Leszczyński michal.leszczyn...@cert.pl 
napisał(a):

> From: Michal Leszczynski 
> 
> Add vmtrace_pt_size domain parameter in live domain and
> vmtrace_pt_order parameter in xen_domctl_createdomain.
> 
> Signed-off-by: Michal Leszczynski 
> ---
> xen/common/domain.c | 12 
> xen/include/public/domctl.h |  1 +
> xen/include/xen/sched.h |  4 
> 3 files changed, 17 insertions(+)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index a45cf023f7..25d3359c5b 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -338,6 +338,12 @@ static int sanitise_domain_config(struct
> xen_domctl_createdomain *config)
> return -EINVAL;
> }
> 
> +if ( config->vmtrace_pt_order && !vmtrace_supported )
> +{
> +dprintk(XENLOG_INFO, "Processor tracing is not supported\n");
> +return -EINVAL;
> +}
> +
> return arch_sanitise_domain_config(config);
> }
> 
> @@ -443,6 +449,12 @@ struct domain *domain_create(domid_t domid,
> d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
> 
> radix_tree_init(>pirq_tree);
> +
> +if ( config->vmtrace_pt_order )
> +{
> +uint32_t shift_val = config->vmtrace_pt_order + PAGE_SHIFT;
> +d->vmtrace_pt_size = (1ULL << shift_val);
> +}
> }
> 
> if ( (err = arch_domain_create(d, config)) != 0 )
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 59bdc28c89..7b8289d436 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
> uint32_t max_evtchn_port;
> int32_t max_grant_frames;
> int32_t max_maptrack_frames;
> +uint8_t vmtrace_pt_order;
> 
> struct xen_arch_domainconfig arch;
> };
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index ac53519d7f..48f0a61bbd 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -457,6 +457,10 @@ struct domain
> unsignedpbuf_idx;
> spinlock_t  pbuf_lock;
> 
> +/* Used by vmtrace features */
> +spinlock_t  vmtrace_lock;
> +uint64_tvmtrace_pt_size;
> +
> /* OProfile support. */
> struct xenoprof *xenoprof;
> 
> --
> 2.17.1


Just a note to myself: in v4 it was suggested by Jan that we should
go with "number of kB" instead of "number of bytes" and the type
could be uint32_t.

I will modify it in such way within the next version.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v5 06/11] x86/hvm: processor trace interface in HVM

2020-07-06 Thread Michał Leszczyński
- 6 lip 2020 o 10:42, Roger Pau Monné roger@citrix.com napisał(a):

> On Sun, Jul 05, 2020 at 08:54:59PM +0200, Michał Leszczyński wrote:
>> From: Michal Leszczynski 
>> 
>> Implement necessary changes in common code/HVM to support
>> processor trace features. Define vmtrace_pt_* API and
>> implement trace buffer allocation/deallocation in common
>> code.
>> 
>> Signed-off-by: Michal Leszczynski 
>> ---
>>  xen/arch/x86/domain.c | 19 +++
>>  xen/common/domain.c   | 19 +++
>>  xen/include/asm-x86/hvm/hvm.h | 20 
>>  xen/include/xen/sched.h   |  4 
>>  4 files changed, 62 insertions(+)
>> 
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index fee6c3931a..79c9794408 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -2199,6 +2199,25 @@ int domain_relinquish_resources(struct domain *d)
>>  altp2m_vcpu_disable_ve(v);
>>  }
>>  
>> +for_each_vcpu ( d, v )
>> +{
>> +unsigned int i;
>> +
>> +if ( !v->vmtrace.pt_buf )
>> +continue;
>> +
>> +for ( i = 0; i < (v->domain->vmtrace_pt_size >> PAGE_SHIFT); 
>> i++ )
>> +{
>> +struct page_info *pg = mfn_to_page(
>> +mfn_add(page_to_mfn(v->vmtrace.pt_buf), i));
>> +if ( (pg->count_info & PGC_count_mask) != 1 )
>> +return -EBUSY;
>> +}
>> +
>> +free_domheap_pages(v->vmtrace.pt_buf,
>> +get_order_from_bytes(v->domain->vmtrace_pt_size));
> 
> This is racy as a control domain could take a reference between the
> check and the freeing.
> 
>> +}
>> +
>>  if ( is_pv_domain(d) )
>>  {
>>  for_each_vcpu ( d, v )
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 25d3359c5b..f480c4e033 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -137,6 +137,21 @@ static void vcpu_destroy(struct vcpu *v)
>>  free_vcpu_struct(v);
>>  }
>>  
>> +static int vmtrace_alloc_buffers(struct vcpu *v)
>> +{
>> +struct page_info *pg;
>> +uint64_t size = v->domain->vmtrace_pt_size;
>> +
>> +pg = alloc_domheap_pages(v->domain, get_order_from_bytes(size),
>> + MEMF_no_refcount);
>> +
>> +if ( !pg )
>> +return -ENOMEM;
>> +
>> +v->vmtrace.pt_buf = pg;
>> +return 0;
>> +}
> 
> I think we already agreed that you would use the same model as ioreq
> servers, where a reference is taken on allocation and then the pages
> are not explicitly freed on domain destruction and put_page_and_type
> is used. Is there some reason why that model doesn't work in this
> case?
> 
> If not, please see hvm_alloc_ioreq_mfn and hvm_free_ioreq_mfn.
> 
> Roger.


Ok, I've got it, will do. Thanks for pointing out the examples.


One thing that is confusing to me is that I don't get what is
the meaning of MEMF_no_refcount flag.

In the hvm_{alloc,free}_ioreq_mfn the memory is allocated
explicitly but freed just by putting out the reference, so
I guess it's automatically detected that the refcount dropped to 0
and the page should be freed? If so, why the flag is named "no refcount"?


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v5 06/11] x86/hvm: processor trace interface in HVM

2020-07-05 Thread Michał Leszczyński
- 5 lip 2020 o 20:54, Michał Leszczyński michal.leszczyn...@cert.pl 
napisał(a):

> From: Michal Leszczynski 
> 
> Implement necessary changes in common code/HVM to support
> processor trace features. Define vmtrace_pt_* API and
> implement trace buffer allocation/deallocation in common
> code.
> 
> Signed-off-by: Michal Leszczynski 
> ---
> xen/arch/x86/domain.c | 19 +++
> xen/common/domain.c   | 19 +++
> xen/include/asm-x86/hvm/hvm.h | 20 
> xen/include/xen/sched.h   |  4 
> 4 files changed, 62 insertions(+)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index fee6c3931a..79c9794408 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -2199,6 +2199,25 @@ int domain_relinquish_resources(struct domain *d)
> altp2m_vcpu_disable_ve(v);
> }
> 
> +for_each_vcpu ( d, v )
> +{
> +unsigned int i;
> +
> +if ( !v->vmtrace.pt_buf )
> +continue;
> +
> +for ( i = 0; i < (v->domain->vmtrace_pt_size >> PAGE_SHIFT); i++ 
> )
> +{
> +struct page_info *pg = mfn_to_page(
> +mfn_add(page_to_mfn(v->vmtrace.pt_buf), i));
> +if ( (pg->count_info & PGC_count_mask) != 1 )
> +return -EBUSY;
> +}
> +
> +free_domheap_pages(v->vmtrace.pt_buf,
> +get_order_from_bytes(v->domain->vmtrace_pt_size));


While this works, I don't feel that this is a good solution with this loop
returning -EBUSY here. I would like to kindly ask for suggestions regarding
this topic.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v5 05/11] tools/libxl: add vmtrace_pt_size parameter

2020-07-05 Thread Michał Leszczyński
- 5 lip 2020 o 20:54, Michał Leszczyński michal.leszczyn...@cert.pl 
napisał(a):

> From: Michal Leszczynski 
> 
> Allow to specify the size of per-vCPU trace buffer upon
> domain creation. This is zero by default (meaning: not enabled).
> 
> Signed-off-by: Michal Leszczynski 
> ---
> docs/man/xl.cfg.5.pod.in | 11 +++
> tools/golang/xenlight/helpers.gen.go |  2 ++
> tools/golang/xenlight/types.gen.go   |  1 +
> tools/libxl/libxl.h  |  8 
> tools/libxl/libxl_create.c   |  1 +
> tools/libxl/libxl_types.idl  |  2 ++
> tools/xl/xl_parse.c  | 22 ++
> 7 files changed, 47 insertions(+)
> 
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index 0532739c1f..670759f6bd 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -278,6 +278,17 @@ memory=8096 will report significantly less memory 
> available
> for use
> than a system with maxmem=8096 memory=8096 due to the memory overhead
> of having to track the unused pages.
> 
> +=item B
> +
> +Specifies the size of processor trace buffer that would be allocated
> +for each vCPU belonging to this domain. Disabled (i.e.
> +B by default. This must be set to
> +non-zero value in order to be able to use processor tracing features
> +with this domain.
> +
> +B: The size value must be between 4 kB and 4 GB and it must
> +be also a power of 2.
> +
> =back
> 
> =head3 Guest Virtual NUMA Configuration
> diff --git a/tools/golang/xenlight/helpers.gen.go
> b/tools/golang/xenlight/helpers.gen.go
> index 152c7e8e6b..bfc37b69c8 100644
> --- a/tools/golang/xenlight/helpers.gen.go
> +++ b/tools/golang/xenlight/helpers.gen.go
> @@ -1117,6 +1117,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
> x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
> x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
> x.Altp2M = Altp2MMode(xc.altp2m)
> +x.VmtracePtOrder = int(xc.vmtrace_pt_order)
> 
>  return nil}
> 
> @@ -1592,6 +1593,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
> xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
> xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
> xc.altp2m = C.libxl_altp2m_mode(x.Altp2M)
> +xc.vmtrace_pt_order = C.int(x.VmtracePtOrder)
> 
>  return nil
>  }
> diff --git a/tools/golang/xenlight/types.gen.go
> b/tools/golang/xenlight/types.gen.go
> index 663c1e86b4..f9b07ac862 100644
> --- a/tools/golang/xenlight/types.gen.go
> +++ b/tools/golang/xenlight/types.gen.go
> @@ -516,6 +516,7 @@ GicVersion GicVersion
> Vuart VuartType
> }
> Altp2M Altp2MMode
> +VmtracePtOrder int
> }
> 
> type domainBuildInfoTypeUnion interface {
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 1cd6c38e83..4abb521756 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -438,6 +438,14 @@
>  */
> #define LIBXL_HAVE_CREATEINFO_PASSTHROUGH 1
> 
> +/*
> + * LIBXL_HAVE_VMTRACE_PT_ORDER indicates that
> + * libxl_domain_create_info has a vmtrace_pt_order parameter, which
> + * allows to enable pre-allocation of processor tracing buffers
> + * with the given order of size.
> + */
> +#define LIBXL_HAVE_VMTRACE_PT_ORDER 1
> +
> /*
>  * libxl ABI compatibility
>  *
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 2814818e34..82b595161a 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -608,6 +608,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config
> *d_config,
> .max_evtchn_port = b_info->event_channels,
> .max_grant_frames = b_info->max_grant_frames,
> .max_maptrack_frames = b_info->max_maptrack_frames,
> +.vmtrace_pt_order = b_info->vmtrace_pt_order,
> };
> 
> if (info->type != LIBXL_DOMAIN_TYPE_PV) {
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 9d3f05f399..1c5dd43e4d 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -645,6 +645,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
> # supported by x86 HVM and ARM support is planned.
> ("altp2m", libxl_altp2m_mode),
> 
> +("vmtrace_pt_order", integer),
> +
> ], dir=DIR_IN,
>copy_deprecated_fn="libxl__domain_build_info_copy_deprecated",
> )
> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index 61b4ef7b7e..279f7c14d3 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -1861,6 +1861,28 @@ void parse_config_data(const c

Re: [PATCH v5 11/11] tools/proctrace: add proctrace tool

2020-07-05 Thread Michał Leszczyński
- 5 lip 2020 o 20:55, Michał Leszczyński michal.leszczyn...@cert.pl 
napisał(a):

> From: Michal Leszczynski 
> 
> Add an demonstration tool that uses xc_vmtrace_* calls in order
> to manage external IPT monitoring for DomU.
> 
> Signed-off-by: Michal Leszczynski 
> ---
> tools/proctrace/Makefile|  48 +++
> tools/proctrace/proctrace.c | 163 
> 2 files changed, 211 insertions(+)
> create mode 100644 tools/proctrace/Makefile
> create mode 100644 tools/proctrace/proctrace.c


> diff --git a/tools/proctrace/proctrace.c b/tools/proctrace/proctrace.c
> new file mode 100644
> index 00..22bf91db8d
> --- /dev/null
> +++ b/tools/proctrace/proctrace.c


> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#define BUF_SIZE (16384 * XC_PAGE_SIZE)


I would like to discuss here, how we should retrieve the trace buffer size
in runtime? Should there be a hypercall for it, or some extension to
acquire_resource logic?

Best regards,
Michał Leszczyński
CERT Polska



[PATCH v5 10/11] tools/libxc: add xc_vmtrace_* functions

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Add functions in libxc that use the new XEN_DOMCTL_vmtrace interface.

Signed-off-by: Michal Leszczynski 
---
 tools/libxc/Makefile  |  1 +
 tools/libxc/include/xenctrl.h | 39 +++
 tools/libxc/xc_vmtrace.c  | 73 +++
 3 files changed, 113 insertions(+)
 create mode 100644 tools/libxc/xc_vmtrace.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index fae5969a73..605e44501d 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -27,6 +27,7 @@ CTRL_SRCS-y   += xc_csched2.c
 CTRL_SRCS-y   += xc_arinc653.c
 CTRL_SRCS-y   += xc_rt.c
 CTRL_SRCS-y   += xc_tbuf.c
+CTRL_SRCS-y   += xc_vmtrace.c
 CTRL_SRCS-y   += xc_pm.c
 CTRL_SRCS-y   += xc_cpu_hotplug.c
 CTRL_SRCS-y   += xc_resume.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 4c89b7294c..34f27fd7d4 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1585,6 +1585,45 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t 
mask);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
+/**
+ * Enable processor trace for given vCPU in given DomU.
+ * Allocate the trace ringbuffer with given size.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_enable(xc_interface *xch, uint32_t domid,
+ uint32_t vcpu);
+
+/**
+ * Disable processor trace for given vCPU in given DomU.
+ * Deallocate the trace ringbuffer.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_disable(xc_interface *xch, uint32_t domid,
+  uint32_t vcpu);
+
+/**
+ * Get current offset inside the trace ringbuffer.
+ * This allows to determine how much data was written into the buffer.
+ * Once buffer overflows, the offset will reset to 0 and the previous
+ * data will be overriden.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm offset current offset inside trace buffer will be written there
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_get_offset(xc_interface *xch, uint32_t domid,
+ uint32_t vcpu, uint64_t *offset);
+
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
diff --git a/tools/libxc/xc_vmtrace.c b/tools/libxc/xc_vmtrace.c
new file mode 100644
index 00..32f90a6203
--- /dev/null
+++ b/tools/libxc/xc_vmtrace.c
@@ -0,0 +1,73 @@
+/**
+ * xc_vmtrace.c
+ *
+ * API for manipulating hardware tracing features
+ *
+ * Copyright (c) 2020, Michal Leszczynski
+ *
+ * Copyright 2020 CERT Polska. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see .
+ */
+
+#include "xc_private.h"
+#include 
+
+int xc_vmtrace_pt_enable(
+xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+DECLARE_DOMCTL;
+int rc;
+
+domctl.cmd = XEN_DOMCTL_vmtrace_op;
+domctl.domain = domid;
+domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_enable;
+domctl.u.vmtrace_op.vcpu = vcpu;
+
+rc = do_domctl(xch, );
+return rc;
+}
+
+int xc_vmtrace_pt_get_offset(
+xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t *offset)
+{
+DECLARE_DOMCTL;
+int rc;
+
+domctl.cmd = XEN_DOMCTL_vmtrace_op;
+domctl.domain = domid;
+domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_get_offset;
+domctl.u.vmtrace_op.vcpu = vcpu;
+
+rc = do_domctl(xch, );
+if ( !rc )
+*offset = domctl.u.vmtrace_op.offset;
+return rc;
+}
+
+int xc_vmtrace_pt_disable(xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+DECLARE_DOMCTL;
+int rc;
+
+domctl.cmd = XEN_DOMCTL_vmtrace_op;
+domctl.domain = domid;
+domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_disable;
+domctl.u.vmtrace_op.vcpu = vcpu;
+
+rc = do_domctl(xch, );
+return rc;
+}
+
-- 
2.17.1




[PATCH v5 04/11] common: add vmtrace_pt_size domain parameter

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Add vmtrace_pt_size domain parameter in live domain and
vmtrace_pt_order parameter in xen_domctl_createdomain.

Signed-off-by: Michal Leszczynski 
---
 xen/common/domain.c | 12 
 xen/include/public/domctl.h |  1 +
 xen/include/xen/sched.h |  4 
 3 files changed, 17 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index a45cf023f7..25d3359c5b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -338,6 +338,12 @@ static int sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 return -EINVAL;
 }
 
+if ( config->vmtrace_pt_order && !vmtrace_supported )
+{
+dprintk(XENLOG_INFO, "Processor tracing is not supported\n");
+return -EINVAL;
+}
+
 return arch_sanitise_domain_config(config);
 }
 
@@ -443,6 +449,12 @@ struct domain *domain_create(domid_t domid,
 d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
 
 radix_tree_init(>pirq_tree);
+
+if ( config->vmtrace_pt_order )
+{
+uint32_t shift_val = config->vmtrace_pt_order + PAGE_SHIFT;
+d->vmtrace_pt_size = (1ULL << shift_val);
+}
 }
 
 if ( (err = arch_domain_create(d, config)) != 0 )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 59bdc28c89..7b8289d436 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
 uint32_t max_evtchn_port;
 int32_t max_grant_frames;
 int32_t max_maptrack_frames;
+uint8_t vmtrace_pt_order;
 
 struct xen_arch_domainconfig arch;
 };
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..48f0a61bbd 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -457,6 +457,10 @@ struct domain
 unsignedpbuf_idx;
 spinlock_t  pbuf_lock;
 
+/* Used by vmtrace features */
+spinlock_t  vmtrace_lock;
+uint64_tvmtrace_pt_size;
+
 /* OProfile support. */
 struct xenoprof *xenoprof;
 
-- 
2.17.1




[PATCH v5 01/11] memory: batch processing in acquire_resource()

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Allow to acquire large resources by allowing acquire_resource()
to process items in batches, using hypercall continuation.

Be aware that this modifies the behavior of acquire_resource
call with frame_list=NULL. While previously it would return
the size of internal array (32), with this patch it returns
the maximal quantity of frames that could be requested at once,
i.e. UINT_MAX >> MEMOP_EXTENT_SHIFT.

Signed-off-by: Michal Leszczynski 
---
 xen/common/memory.c | 49 -
 1 file changed, 44 insertions(+), 5 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..eb42f883df 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, 
unsigned int id,
 }
 
 static int acquire_resource(
-XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
+XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
+unsigned long *start_extent)
 {
 struct domain *d, *currd = current->domain;
 xen_mem_acquire_resource_t xmar;
+uint32_t total_frames;
 /*
  * The mfn_list and gfn_list (below) arrays are ok on stack for the
  * moment since they are small, but if they need to grow in future
@@ -1069,7 +1071,7 @@ static int acquire_resource(
 if ( xmar.nr_frames )
 return -EINVAL;
 
-xmar.nr_frames = ARRAY_SIZE(mfn_list);
+xmar.nr_frames = UINT_MAX >> MEMOP_EXTENT_SHIFT;
 
 if ( __copy_field_to_guest(arg, , nr_frames) )
 return -EFAULT;
@@ -1077,8 +1079,28 @@ static int acquire_resource(
 return 0;
 }
 
+total_frames = xmar.nr_frames;
+
+/* Is the size too large for us to encode a continuation? */
+if ( unlikely(xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT)) )
+return -EINVAL;
+
+if ( *start_extent )
+{
+/*
+ * Check whether start_extent is in bounds, as this
+ * value if visible to the calling domain.
+ */
+if ( *start_extent > xmar.nr_frames )
+return -EINVAL;
+
+xmar.frame += *start_extent;
+xmar.nr_frames -= *start_extent;
+guest_handle_add_offset(xmar.frame_list, *start_extent);
+}
+
 if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
-return -E2BIG;
+xmar.nr_frames = ARRAY_SIZE(mfn_list);
 
 rc = rcu_lock_remote_domain_by_id(xmar.domid, );
 if ( rc )
@@ -1135,6 +1157,14 @@ static int acquire_resource(
 }
 }
 
+if ( !rc )
+{
+*start_extent += xmar.nr_frames;
+
+if ( *start_extent != total_frames )
+rc = -ERESTART;
+}
+
  out:
 rcu_unlock_domain(d);
 
@@ -1599,8 +1629,17 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 #endif
 
 case XENMEM_acquire_resource:
-rc = acquire_resource(
-guest_handle_cast(arg, xen_mem_acquire_resource_t));
+do {
+rc = acquire_resource(
+guest_handle_cast(arg, xen_mem_acquire_resource_t),
+_extent);
+
+if ( hypercall_preempt_check() )
+return hypercall_create_continuation(
+__HYPERVISOR_memory_op, "lh",
+op | (start_extent << MEMOP_EXTENT_SHIFT), arg);
+} while ( rc == -ERESTART );
+
 break;
 
 default:
-- 
2.17.1




[PATCH v5 11/11] tools/proctrace: add proctrace tool

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Add an demonstration tool that uses xc_vmtrace_* calls in order
to manage external IPT monitoring for DomU.

Signed-off-by: Michal Leszczynski 
---
 tools/proctrace/Makefile|  48 +++
 tools/proctrace/proctrace.c | 163 
 2 files changed, 211 insertions(+)
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

diff --git a/tools/proctrace/Makefile b/tools/proctrace/Makefile
new file mode 100644
index 00..2983c477fe
--- /dev/null
+++ b/tools/proctrace/Makefile
@@ -0,0 +1,48 @@
+# Copyright (C) CERT Polska - NASK PIB
+# Author: Michał Leszczyński 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; under version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+XEN_ROOT=$(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+CFLAGS  += -Werror
+CFLAGS  += $(CFLAGS_libxenevtchn)
+CFLAGS  += $(CFLAGS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenevtchn)
+LDLIBS  += $(LDLIBS_libxenforeignmemory)
+
+.PHONY: all
+all: build
+
+.PHONY: build
+build: proctrace
+
+.PHONY: install
+install: build
+   $(INSTALL_DIR) $(DESTDIR)$(sbindir)
+   $(INSTALL_PROG) proctrace $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: uninstall
+uninstall:
+   rm -f $(DESTDIR)$(sbindir)/proctrace
+
+.PHONY: clean
+clean:
+   $(RM) -f $(DEPS_RM)
+
+.PHONY: distclean
+distclean: clean
+
+iptlive: iptlive.o Makefile
+   $(CC) $(LDFLAGS) $< -o $@ $(LDLIBS) $(APPEND_LDFLAGS)
+
+-include $(DEPS_INCLUDE)
diff --git a/tools/proctrace/proctrace.c b/tools/proctrace/proctrace.c
new file mode 100644
index 00..22bf91db8d
--- /dev/null
+++ b/tools/proctrace/proctrace.c
@@ -0,0 +1,163 @@
+/**
+ * tools/proctrace.c
+ *
+ * Demonstrative tool for collecting Intel Processor Trace data from Xen.
+ *  Could be used to externally monitor a given vCPU in given DomU.
+ *
+ * Copyright (C) 2020 by CERT Polska - NASK PIB
+ *
+ * Authors: Michał Leszczyński, michal.leszczyn...@cert.pl
+ * Date:June, 2020
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; under version 2 of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define BUF_SIZE (16384 * XC_PAGE_SIZE)
+
+volatile int interrupted = 0;
+
+void term_handler(int signum) {
+interrupted = 1;
+}
+
+int main(int argc, char* argv[]) {
+xc_interface *xc;
+uint32_t domid;
+uint32_t vcpu_id;
+
+int rc = -1;
+uint8_t *buf = NULL;
+uint64_t last_offset = 0;
+
+xenforeignmemory_handle *fmem;
+xenforeignmemory_resource_handle *fres;
+
+if (signal(SIGINT, term_handler) == SIG_ERR)
+{
+fprintf(stderr, "Failed to register signal handler\n");
+return 1;
+}
+
+if (argc != 3) {
+fprintf(stderr, "Usage: %s  \n", argv[0]);
+fprintf(stderr, "It's recommended to redirect this"
+"program's output to file\n");
+fprintf(stderr, "or to pipe it's output to xxd or other program.\n");
+return 1;
+}
+
+domid = atoi(argv[1]);
+vcpu_id = atoi(argv[2]);
+
+xc = xc_interface_open(0, 0, 0);
+
+fmem = xenforeignmemory_open(0, 0);
+
+if (!xc) {
+fprintf(stderr, "Failed to open xc interface\n");
+return 1;
+}
+
+rc = xc_vmtrace_pt_enable(xc, domid, vcpu_id);
+
+if (rc) {
+fprintf(stderr, "Failed to call xc_vmtrace_pt_enable\n");
+return 1;
+}
+
+fres = xenforeignmemory_map_resource(
+fmem, domid, XENMEM_resource_vmtrace_buf,
+/* vcpu: */ vcpu_id,
+/* frame: */ 0,
+/* num_frames: */ BUF_SIZE >> XC_PAGE_SHIFT,
+(void **),
+PROT_READ, 0);
+
+if (!buf) {
+fprintf(stderr, "Failed to map trace buffer\n");
+return 1;
+}
+
+while (!interrupted) {
+uint64_t offset;
+rc = xc_vmtrace_pt_get_

[PATCH v5 09/11] x86/domctl: add XEN_DOMCTL_vmtrace_op

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Implement domctl to manage the runtime state of
processor trace feature.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/domctl.c   | 48 +
 xen/include/public/domctl.h | 26 
 2 files changed, 74 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 6f2c69788d..a041b724d8 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -322,6 +322,48 @@ void arch_get_domain_info(const struct domain *d,
 info->arch_config.emulation_flags = d->arch.emulation_flags;
 }
 
+static int do_vmtrace_op(struct domain *d, struct xen_domctl_vmtrace_op *op,
+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+{
+int rc;
+struct vcpu *v;
+
+if ( !vmtrace_supported )
+return -EOPNOTSUPP;
+
+if ( !is_hvm_domain(d) )
+return -EOPNOTSUPP;
+
+if ( op->vcpu >= d->max_vcpus )
+return -EINVAL;
+
+v = domain_vcpu(d, op->vcpu);
+rc = 0;
+
+switch ( op->cmd )
+{
+case XEN_DOMCTL_vmtrace_pt_enable:
+case XEN_DOMCTL_vmtrace_pt_disable:
+vcpu_pause(v);
+spin_lock(>vmtrace_lock);
+
+rc = vmtrace_control_pt(v, op->cmd == XEN_DOMCTL_vmtrace_pt_enable);
+
+spin_unlock(>vmtrace_lock);
+vcpu_unpause(v);
+break;
+
+case XEN_DOMCTL_vmtrace_pt_get_offset:
+rc = vmtrace_get_pt_offset(v, >offset);
+break;
+
+default:
+rc = -EOPNOTSUPP;
+}
+
+return rc;
+}
+
 #define MAX_IOPORTS 0x1
 
 long arch_do_domctl(
@@ -337,6 +379,12 @@ long arch_do_domctl(
 switch ( domctl->cmd )
 {
 
+case XEN_DOMCTL_vmtrace_op:
+ret = do_vmtrace_op(d, >u.vmtrace_op, u_domctl);
+if ( !ret )
+copyback = true;
+   break;
+
 case XEN_DOMCTL_shadow_op:
 ret = paging_domctl(d, >u.shadow_op, u_domctl, 0);
 if ( ret == -ERESTART )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 7b8289d436..f836cb5970 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1136,6 +1136,28 @@ struct xen_domctl_vuart_op {
  */
 };
 
+/* XEN_DOMCTL_vmtrace_op: Perform VM tracing related operation */
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+
+struct xen_domctl_vmtrace_op {
+/* IN variable */
+uint32_t cmd;
+/* Enable/disable external vmtrace for given domain */
+#define XEN_DOMCTL_vmtrace_pt_enable  1
+#define XEN_DOMCTL_vmtrace_pt_disable 2
+#define XEN_DOMCTL_vmtrace_pt_get_offset  3
+domid_t domain;
+uint32_t vcpu;
+uint64_aligned_t size;
+
+/* OUT variable */
+uint64_aligned_t offset;
+};
+typedef struct xen_domctl_vmtrace_op xen_domctl_vmtrace_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_vmtrace_op_t);
+
+#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
+
 struct xen_domctl {
 uint32_t cmd;
 #define XEN_DOMCTL_createdomain   1
@@ -1217,6 +1239,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_vuart_op  81
 #define XEN_DOMCTL_get_cpu_policy82
 #define XEN_DOMCTL_set_cpu_policy83
+#define XEN_DOMCTL_vmtrace_op84
 #define XEN_DOMCTL_gdbsx_guestmemio1000
 #define XEN_DOMCTL_gdbsx_pausevcpu 1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu   1002
@@ -1277,6 +1300,9 @@ struct xen_domctl {
 struct xen_domctl_monitor_opmonitor_op;
 struct xen_domctl_psr_alloc psr_alloc;
 struct xen_domctl_vuart_op  vuart_op;
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+struct xen_domctl_vmtrace_opvmtrace_op;
+#endif
 uint8_t pad[128];
 } u;
 };
-- 
2.17.1




[PATCH v5 06/11] x86/hvm: processor trace interface in HVM

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Implement necessary changes in common code/HVM to support
processor trace features. Define vmtrace_pt_* API and
implement trace buffer allocation/deallocation in common
code.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/domain.c | 19 +++
 xen/common/domain.c   | 19 +++
 xen/include/asm-x86/hvm/hvm.h | 20 
 xen/include/xen/sched.h   |  4 
 4 files changed, 62 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fee6c3931a..79c9794408 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2199,6 +2199,25 @@ int domain_relinquish_resources(struct domain *d)
 altp2m_vcpu_disable_ve(v);
 }
 
+for_each_vcpu ( d, v )
+{
+unsigned int i;
+
+if ( !v->vmtrace.pt_buf )
+continue;
+
+for ( i = 0; i < (v->domain->vmtrace_pt_size >> PAGE_SHIFT); i++ )
+{
+struct page_info *pg = mfn_to_page(
+mfn_add(page_to_mfn(v->vmtrace.pt_buf), i));
+if ( (pg->count_info & PGC_count_mask) != 1 )
+return -EBUSY;
+}
+
+free_domheap_pages(v->vmtrace.pt_buf,
+get_order_from_bytes(v->domain->vmtrace_pt_size));
+}
+
 if ( is_pv_domain(d) )
 {
 for_each_vcpu ( d, v )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 25d3359c5b..f480c4e033 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -137,6 +137,21 @@ static void vcpu_destroy(struct vcpu *v)
 free_vcpu_struct(v);
 }
 
+static int vmtrace_alloc_buffers(struct vcpu *v)
+{
+struct page_info *pg;
+uint64_t size = v->domain->vmtrace_pt_size;
+
+pg = alloc_domheap_pages(v->domain, get_order_from_bytes(size),
+ MEMF_no_refcount);
+
+if ( !pg )
+return -ENOMEM;
+
+v->vmtrace.pt_buf = pg;
+return 0;
+}
+
 struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
 {
 struct vcpu *v;
@@ -162,6 +177,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int 
vcpu_id)
 v->vcpu_id = vcpu_id;
 v->dirty_cpu = VCPU_CPU_CLEAN;
 
+if ( d->vmtrace_pt_size && vmtrace_alloc_buffers(v) != 0 )
+return NULL;
+
 spin_lock_init(>virq_lock);
 
 tasklet_init(>continue_hypercall_tasklet, NULL, NULL);
@@ -422,6 +440,7 @@ struct domain *domain_create(domid_t domid,
 d->shutdown_code = SHUTDOWN_CODE_INVALID;
 
 spin_lock_init(>pbuf_lock);
+spin_lock_init(>vmtrace_lock);
 
 rwlock_init(>vnuma_rwlock);
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1eb377dd82..2d474a4c50 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -214,6 +214,10 @@ struct hvm_function_table {
 bool_t (*altp2m_vcpu_emulate_ve)(struct vcpu *v);
 int (*altp2m_vcpu_emulate_vmfunc)(const struct cpu_user_regs *regs);
 
+/* vmtrace */
+int (*vmtrace_control_pt)(struct vcpu *v, bool enable);
+int (*vmtrace_get_pt_offset)(struct vcpu *v, uint64_t *offset);
+
 /*
  * Parameters and callbacks for hardware-assisted TSC scaling,
  * which are valid only when the hardware feature is available.
@@ -655,6 +659,22 @@ static inline bool altp2m_vcpu_emulate_ve(struct vcpu *v)
 return false;
 }
 
+static inline int vmtrace_control_pt(struct vcpu *v, bool enable)
+{
+if ( hvm_funcs.vmtrace_control_pt )
+return hvm_funcs.vmtrace_control_pt(v, enable);
+
+return -EOPNOTSUPP;
+}
+
+static inline int vmtrace_get_pt_offset(struct vcpu *v, uint64_t *offset)
+{
+if ( hvm_funcs.vmtrace_get_pt_offset )
+return hvm_funcs.vmtrace_get_pt_offset(v, offset);
+
+return -EOPNOTSUPP;
+}
+
 /*
  * This must be defined as a macro instead of an inline function,
  * because it uses 'struct vcpu' and 'struct domain' which have
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 48f0a61bbd..95ebab0d30 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -253,6 +253,10 @@ struct vcpu
 /* vPCI per-vCPU area, used to store data for long running operations. */
 struct vpci_vcpu vpci;
 
+struct {
+struct page_info *pt_buf;
+} vmtrace;
+
 struct arch_vcpu arch;
 };
 
-- 
2.17.1




[PATCH v5 03/11] x86/vmx: add IPT cpu feature

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Check if Intel Processor Trace feature is supported by current
processor. Define vmtrace_supported global variable.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/vmx/vmcs.c | 15 ++-
 xen/common/domain.c |  2 ++
 xen/include/asm-x86/cpufeature.h|  1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |  1 +
 xen/include/public/arch-x86/cpufeatureset.h |  1 +
 xen/include/xen/domain.h|  2 ++
 6 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index ca94c2bedc..3a53553f10 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -291,6 +291,20 @@ static int vmx_init_vmcs_config(void)
 _vmx_cpu_based_exec_control &=
 ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
 
+rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
+
+/* Check whether IPT is supported in VMX operation. */
+if ( !smp_processor_id() )
+vmtrace_supported = cpu_has_ipt &&
+(_vmx_misc_cap & VMX_MISC_PROC_TRACE);
+else if ( vmtrace_supported &&
+  !(_vmx_misc_cap & VMX_MISC_PROC_TRACE) )
+{
+printk("VMX: IPT capabilities fatally differ between CPU%u and CPU0\n",
+   smp_processor_id());
+return -EINVAL;
+}
+
 if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
 {
 min = 0;
@@ -305,7 +319,6 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
SECONDARY_EXEC_XSAVES |
SECONDARY_EXEC_TSC_SCALING);
-rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
 if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
 if ( opt_vpid_enabled )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7cc9526139..a45cf023f7 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
 
 vcpu_info_t dummy_vcpu_info;
 
+bool vmtrace_supported __read_mostly;
+
 static void __domain_finalise_shutdown(struct domain *d)
 {
 struct vcpu *v;
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f790d5c1f8..555f696a26 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -104,6 +104,7 @@
 #define cpu_has_clwbboot_cpu_has(X86_FEATURE_CLWB)
 #define cpu_has_avx512erboot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_avx512cdboot_cpu_has(X86_FEATURE_AVX512CD)
+#define cpu_has_ipt boot_cpu_has(X86_FEATURE_PROC_TRACE)
 #define cpu_has_sha boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bwboot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vlboot_cpu_has(X86_FEATURE_AVX512VL)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 906810592f..6153ba6769 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -283,6 +283,7 @@ extern u32 vmx_secondary_exec_control;
 #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x800ULL
 extern u64 vmx_ept_vpid_cap;
 
+#define VMX_MISC_PROC_TRACE 0x4000
 #define VMX_MISC_CR3_TARGET 0x01ff
 #define VMX_MISC_VMWRITE_ALL0x2000
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index fe7492a225..2c91862f2d 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -217,6 +217,7 @@ XEN_CPUFEATURE(SMAP,  5*32+20) /*S  Supervisor Mode 
Access Prevention */
 XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add 
*/
 XEN_CPUFEATURE(CLFLUSHOPT,5*32+23) /*A  CLFLUSHOPT instruction */
 XEN_CPUFEATURE(CLWB,  5*32+24) /*A  CLWB instruction */
+XEN_CPUFEATURE(PROC_TRACE,5*32+25) /*   Processor Tracing feature */
 XEN_CPUFEATURE(AVX512PF,  5*32+26) /*A  AVX-512 Prefetch Instructions */
 XEN_CPUFEATURE(AVX512ER,  5*32+27) /*A  AVX-512 Exponent & Reciprocal 
Instrs */
 XEN_CPUFEATURE(AVX512CD,  5*32+28) /*A  AVX-512 Conflict Detection Instrs 
*/
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 7e51d361de..61ebc6c24d 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -130,4 +130,6 @@ struct vnuma_info {
 
 void vnuma_destroy(struct vnuma_info *vnuma);
 
+extern bool vmtrace_supported;
+
 #endif /* __XEN_DOMAIN_H__ */
-- 
2.17.1




[PATCH v5 07/11] x86/vmx: implement IPT in VMX

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Use Intel Processor Trace feature to provide vmtrace_pt_*
interface for HVM/VMX.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/vmx/vmx.c | 109 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   3 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |  14 
 3 files changed, 126 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index cc6d4ece22..4eded2ef84 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -428,6 +428,56 @@ static void vmx_domain_relinquish_resources(struct domain 
*d)
 vmx_free_vlapic_mapping(d);
 }
 
+static int vmx_init_pt(struct vcpu *v)
+{
+int rc;
+uint64_t size = v->domain->vmtrace_pt_size;
+
+v->arch.hvm.vmx.ipt_state = xzalloc(struct ipt_state);
+
+if ( !v->arch.hvm.vmx.ipt_state )
+return -ENOMEM;
+
+if ( !v->vmtrace.pt_buf || !size )
+return -EINVAL;
+
+/*
+ * We don't accept trace buffer size smaller than single page
+ * and the upper bound is defined as 4GB in the specification.
+ * The buffer size must be also a power of 2.
+ */
+if ( size < PAGE_SIZE || size > GB(4) || (size & (size - 1)) )
+return -EINVAL;
+
+v->arch.hvm.vmx.ipt_state->output_base =
+page_to_maddr(v->vmtrace.pt_buf);
+v->arch.hvm.vmx.ipt_state->output_mask.raw = size - 1;
+
+rc = vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0);
+
+if ( rc )
+return rc;
+
+rc = vmx_add_guest_msr(v, MSR_RTIT_CTL,
+  RTIT_CTL_TRACE_EN | RTIT_CTL_OS |
+  RTIT_CTL_USR | RTIT_CTL_BRANCH_EN);
+
+if ( rc )
+return rc;
+
+return 0;
+}
+
+static int vmx_destroy_pt(struct vcpu* v)
+{
+if ( v->arch.hvm.vmx.ipt_state )
+xfree(v->arch.hvm.vmx.ipt_state);
+
+v->arch.hvm.vmx.ipt_state = NULL;
+return 0;
+}
+
+
 static int vmx_vcpu_initialise(struct vcpu *v)
 {
 int rc;
@@ -471,6 +521,14 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
 vmx_install_vlapic_mapping(v);
 
+if ( v->domain->vmtrace_pt_size )
+{
+rc = vmx_init_pt(v);
+
+if ( rc )
+return rc;
+}
+
 return 0;
 }
 
@@ -483,6 +541,7 @@ static void vmx_vcpu_destroy(struct vcpu *v)
  * prior to vmx_domain_destroy so we need to disable PML for each vcpu
  * separately here.
  */
+vmx_destroy_pt(v);
 vmx_vcpu_disable_pml(v);
 vmx_destroy_vmcs(v);
 passive_domain_destroy(v);
@@ -513,6 +572,18 @@ static void vmx_save_guest_msrs(struct vcpu *v)
  * be updated at any time via SWAPGS, which we cannot trap.
  */
 v->arch.hvm.vmx.shadow_gs = rdgsshadow();
+
+if ( unlikely(v->arch.hvm.vmx.ipt_state &&
+  v->arch.hvm.vmx.ipt_state->active) )
+{
+uint64_t rtit_ctl;
+rdmsrl(MSR_RTIT_CTL, rtit_ctl);
+BUG_ON(rtit_ctl & RTIT_CTL_TRACE_EN);
+
+rdmsrl(MSR_RTIT_STATUS, v->arch.hvm.vmx.ipt_state->status);
+rdmsrl(MSR_RTIT_OUTPUT_MASK,
+   v->arch.hvm.vmx.ipt_state->output_mask.raw);
+}
 }
 
 static void vmx_restore_guest_msrs(struct vcpu *v)
@@ -524,6 +595,17 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
 
 if ( cpu_has_msr_tsc_aux )
 wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
+
+if ( unlikely(v->arch.hvm.vmx.ipt_state &&
+  v->arch.hvm.vmx.ipt_state->active) )
+{
+wrmsrl(MSR_RTIT_OUTPUT_BASE,
+   v->arch.hvm.vmx.ipt_state->output_base);
+wrmsrl(MSR_RTIT_OUTPUT_MASK,
+   v->arch.hvm.vmx.ipt_state->output_mask.raw);
+wrmsrl(MSR_RTIT_STATUS,
+   v->arch.hvm.vmx.ipt_state->status);
+}
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
@@ -2240,6 +2322,24 @@ static bool vmx_get_pending_event(struct vcpu *v, struct 
x86_event *info)
 return true;
 }
 
+static int vmx_control_pt(struct vcpu *v, bool enable)
+{
+if ( !v->arch.hvm.vmx.ipt_state )
+return -EINVAL;
+
+v->arch.hvm.vmx.ipt_state->active = enable;
+return 0;
+}
+
+static int vmx_get_pt_offset(struct vcpu *v, uint64_t *offset)
+{
+if ( !v->arch.hvm.vmx.ipt_state )
+return -EINVAL;
+
+*offset = v->arch.hvm.vmx.ipt_state->output_mask.offset;
+return 0;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
 .name = "VMX",
 .cpu_up_prepare   = vmx_cpu_up_prepare,
@@ -2295,6 +2395,8 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 .altp2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
 .altp2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
 .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
+.vmtrace_control_pt = vmx_control_pt,
+.vmtrace_get_pt_offset = vmx_get_pt_offset,
 .tsc_scaling = {
 .max_ratio = VMX_TSC_MULTIPLIER_MAX,
 },
@@ -3674,6 +3776,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
 

[PATCH v5 08/11] x86/mm: add vmtrace_buf resource type

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Allow to map processor trace buffer using
acquire_resource().

Signed-off-by: Michal Leszczynski 
---
 xen/common/memory.c | 28 
 xen/include/public/memory.h |  1 +
 2 files changed, 29 insertions(+)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index eb42f883df..04f4e152c0 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1007,6 +1007,29 @@ static long xatp_permission_check(struct domain *d, 
unsigned int space)
 return xsm_add_to_physmap(XSM_TARGET, current->domain, d);
 }
 
+static int acquire_vmtrace_buf(struct domain *d, unsigned int id,
+   unsigned long frame,
+   unsigned int nr_frames,
+   xen_pfn_t mfn_list[])
+{
+mfn_t mfn;
+unsigned int i;
+struct vcpu *v = domain_vcpu(d, id);
+
+if ( !v || !v->vmtrace.pt_buf )
+return -EINVAL;
+
+mfn = page_to_mfn(v->vmtrace.pt_buf);
+
+if ( frame + nr_frames > (v->domain->vmtrace_pt_size >> PAGE_SHIFT) )
+return -EINVAL;
+
+for ( i = 0; i < nr_frames; i++ )
+mfn_list[i] = mfn_x(mfn_add(mfn, frame + i));
+
+return 0;
+}
+
 static int acquire_grant_table(struct domain *d, unsigned int id,
unsigned long frame,
unsigned int nr_frames,
@@ -1117,6 +1140,11 @@ static int acquire_resource(
  mfn_list);
 break;
 
+case XENMEM_resource_vmtrace_buf:
+rc = acquire_vmtrace_buf(d, xmar.id, xmar.frame, xmar.nr_frames,
+ mfn_list);
+break;
+
 default:
 rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame,
xmar.nr_frames, mfn_list);
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 21057ed78e..f4c905a10e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -625,6 +625,7 @@ struct xen_mem_acquire_resource {
 
 #define XENMEM_resource_ioreq_server 0
 #define XENMEM_resource_grant_table 1
+#define XENMEM_resource_vmtrace_buf 2
 
 /*
  * IN - a type-specific resource identifier, which must be zero
-- 
2.17.1




[PATCH v5 05/11] tools/libxl: add vmtrace_pt_size parameter

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Allow to specify the size of per-vCPU trace buffer upon
domain creation. This is zero by default (meaning: not enabled).

Signed-off-by: Michal Leszczynski 
---
 docs/man/xl.cfg.5.pod.in | 11 +++
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libxl/libxl.h  |  8 
 tools/libxl/libxl_create.c   |  1 +
 tools/libxl/libxl_types.idl  |  2 ++
 tools/xl/xl_parse.c  | 22 ++
 7 files changed, 47 insertions(+)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 0532739c1f..670759f6bd 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -278,6 +278,17 @@ memory=8096 will report significantly less memory 
available for use
 than a system with maxmem=8096 memory=8096 due to the memory overhead
 of having to track the unused pages.
 
+=item B
+
+Specifies the size of processor trace buffer that would be allocated
+for each vCPU belonging to this domain. Disabled (i.e.
+B by default. This must be set to
+non-zero value in order to be able to use processor tracing features
+with this domain.
+
+B: The size value must be between 4 kB and 4 GB and it must
+be also a power of 2.
+
 =back
 
 =head3 Guest Virtual NUMA Configuration
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 152c7e8e6b..bfc37b69c8 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1117,6 +1117,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.Altp2M = Altp2MMode(xc.altp2m)
+x.VmtracePtOrder = int(xc.vmtrace_pt_order)
 
  return nil}
 
@@ -1592,6 +1593,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.altp2m = C.libxl_altp2m_mode(x.Altp2M)
+xc.vmtrace_pt_order = C.int(x.VmtracePtOrder)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 663c1e86b4..f9b07ac862 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -516,6 +516,7 @@ GicVersion GicVersion
 Vuart VuartType
 }
 Altp2M Altp2MMode
+VmtracePtOrder int
 }
 
 type domainBuildInfoTypeUnion interface {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 1cd6c38e83..4abb521756 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -438,6 +438,14 @@
  */
 #define LIBXL_HAVE_CREATEINFO_PASSTHROUGH 1
 
+/*
+ * LIBXL_HAVE_VMTRACE_PT_ORDER indicates that
+ * libxl_domain_create_info has a vmtrace_pt_order parameter, which
+ * allows to enable pre-allocation of processor tracing buffers
+ * with the given order of size.
+ */
+#define LIBXL_HAVE_VMTRACE_PT_ORDER 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 2814818e34..82b595161a 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -608,6 +608,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config 
*d_config,
 .max_evtchn_port = b_info->event_channels,
 .max_grant_frames = b_info->max_grant_frames,
 .max_maptrack_frames = b_info->max_maptrack_frames,
+.vmtrace_pt_order = b_info->vmtrace_pt_order,
 };
 
 if (info->type != LIBXL_DOMAIN_TYPE_PV) {
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9d3f05f399..1c5dd43e4d 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -645,6 +645,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
 # supported by x86 HVM and ARM support is planned.
 ("altp2m", libxl_altp2m_mode),
 
+("vmtrace_pt_order", integer),
+
 ], dir=DIR_IN,
copy_deprecated_fn="libxl__domain_build_info_copy_deprecated",
 )
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 61b4ef7b7e..279f7c14d3 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1861,6 +1861,28 @@ void parse_config_data(const char *config_source,
 }
 }
 
+if (!xlu_cfg_get_long(config, "processor_trace_buffer_size", , 1) && l) {
+int32_t shift = 0;
+
+if (l & (l - 1))
+{
+fprintf(stderr, "ERROR: processor_trace_buffer_size "
+   "- must be a power of 2\n");
+exit(1);
+}
+
+while (l >>= 1) ++shift;
+
+if (shift <= XEN_PAGE_SHIFT)
+{
+fprintf(stderr, "ERROR: processor_trace_buffer_size "
+   "- value is too small\n");
+exit(1);
+}
+
+b_info->vmtrace_pt_order = shift - XEN_PAGE_SHIFT;
+}
+
 if (!xlu_cfg_get_list(config, "ioports", , _ioports, 0)) {
 

[PATCH v5 02/11] x86/vmx: add Intel PT MSR definitions

2020-07-05 Thread Michał Leszczyński
From: Michal Leszczynski 

Define constants related to Intel Processor Trace features.

Signed-off-by: Michal Leszczynski 
Acked-by: Andrew Cooper 
---
 xen/include/asm-x86/msr-index.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 0fe98af923..4fd54fb5c9 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -72,7 +72,31 @@
 #define MSR_RTIT_OUTPUT_BASE0x0560
 #define MSR_RTIT_OUTPUT_MASK0x0561
 #define MSR_RTIT_CTL0x0570
+#define  RTIT_CTL_TRACE_EN  (_AC(1, ULL) <<  0)
+#define  RTIT_CTL_CYC_EN(_AC(1, ULL) <<  1)
+#define  RTIT_CTL_OS(_AC(1, ULL) <<  2)
+#define  RTIT_CTL_USR   (_AC(1, ULL) <<  3)
+#define  RTIT_CTL_PWR_EVT_EN(_AC(1, ULL) <<  4)
+#define  RTIT_CTL_FUP_ON_PTW(_AC(1, ULL) <<  5)
+#define  RTIT_CTL_FABRIC_EN (_AC(1, ULL) <<  6)
+#define  RTIT_CTL_CR3_FILTER(_AC(1, ULL) <<  7)
+#define  RTIT_CTL_TOPA  (_AC(1, ULL) <<  8)
+#define  RTIT_CTL_MTC_EN(_AC(1, ULL) <<  9)
+#define  RTIT_CTL_TSC_EN(_AC(1, ULL) << 10)
+#define  RTIT_CTL_DIS_RETC  (_AC(1, ULL) << 11)
+#define  RTIT_CTL_PTW_EN(_AC(1, ULL) << 12)
+#define  RTIT_CTL_BRANCH_EN (_AC(1, ULL) << 13)
+#define  RTIT_CTL_MTC_FREQ  (_AC(0xf, ULL) << 14)
+#define  RTIT_CTL_CYC_THRESH(_AC(0xf, ULL) << 19)
+#define  RTIT_CTL_PSB_FREQ  (_AC(0xf, ULL) << 24)
+#define  RTIT_CTL_ADDR(n)   (_AC(0xf, ULL) << (32 + 4 * (n)))
 #define MSR_RTIT_STATUS 0x0571
+#define  RTIT_STATUS_FILTER_EN  (_AC(1, ULL) <<  0)
+#define  RTIT_STATUS_CONTEXT_EN (_AC(1, ULL) <<  1)
+#define  RTIT_STATUS_TRIGGER_EN (_AC(1, ULL) <<  2)
+#define  RTIT_STATUS_ERROR  (_AC(1, ULL) <<  4)
+#define  RTIT_STATUS_STOPPED(_AC(1, ULL) <<  5)
+#define  RTIT_STATUS_BYTECNT(_AC(0x1, ULL) << 32)
 #define MSR_RTIT_CR3_MATCH  0x0572
 #define MSR_RTIT_ADDR_A(n) (0x0580 + (n) * 2)
 #define MSR_RTIT_ADDR_B(n) (0x0581 + (n) * 2)
-- 
2.17.1




[PATCH v5 00/11] Implement support for external IPT monitoring

2020-07-05 Thread Michał Leszczyński
Intel Processor Trace is an architectural extension available in modern Intel 
family CPUs. It allows recording the detailed trace of activity while the 
processor executes the code. One might use the recorded trace to reconstruct 
the code flow. It means, to find out the executed code paths, determine 
branches taken, and so forth.

The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures 
Software Developer's Manual Volume 3C: System Programming Guide, Part 3, 
Chapter 36: "Intel Processor Trace."

This patch series implements an interface that Dom0 could use in order to 
enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such 
a feature has numerous applications like malware monitoring, fuzzing, or 
performance testing.

Also thanks to Tamas K Lengyel for a few preliminary hints before
first version of this patch was submitted to xen-devel.

Changed since v1:
  * MSR_RTIT_CTL is managed using MSR load lists
  * other PT-related MSRs are modified only when vCPU goes out of context
  * trace buffer is now acquired as a resource
  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
must be specified in the moment of domain creation
  * trace buffers are allocated on domain creation, destructed on
domain destruction
  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
these calls don't manage buffer memory anymore
  * lifted 32 MFN/GFN array limit when acquiring resources
  * minor code style changes according to review

Changed since v2:
  * trace buffer is now allocated on domain creation (in v2 it was
allocated when hvm param was set)
  * restored 32-item limit in mfn/gfn arrays in acquire_resource
and instead implemented hypercall continuations
  * code changes according to Jan's and Roger's review

Changed since v3:
  * vmtrace HVMOPs are not implemented as DOMCTLs
  * patches splitted up according to Andrew's comments
  * code changes according to v3 review on the mailing list

Changed since v4:
  * rebased to commit be63d9d4
  * fixed dependencies between patches
(earlier patches don't reference further patches)
  * introduced preemption check in acquire_resource
  * moved buffer allocation to common code
  * splitted some patches according to code review
  * minor fixes according to code review

This patch series is available on GitHub:
https://github.com/icedevml/xen/tree/ipt-patch-v5


Michal Leszczynski (11):
  memory: batch processing in acquire_resource()
  x86/vmx: add Intel PT MSR definitions
  x86/vmx: add IPT cpu feature
  common: add vmtrace_pt_size domain parameter
  tools/libxl: add vmtrace_pt_size parameter
  x86/hvm: processor trace interface in HVM
  x86/vmx: implement IPT in VMX
  x86/mm: add vmtrace_buf resource type
  x86/domctl: add XEN_DOMCTL_vmtrace_op
  tools/libxc: add xc_vmtrace_* functions
  tools/proctrace: add proctrace tool

 docs/man/xl.cfg.5.pod.in|  11 ++
 tools/golang/xenlight/helpers.gen.go|   2 +
 tools/golang/xenlight/types.gen.go  |   1 +
 tools/libxc/Makefile|   1 +
 tools/libxc/include/xenctrl.h   |  39 +
 tools/libxc/xc_vmtrace.c|  73 +
 tools/libxl/libxl.h |   8 +
 tools/libxl/libxl_create.c  |   1 +
 tools/libxl/libxl_types.idl |   2 +
 tools/proctrace/Makefile|  48 ++
 tools/proctrace/proctrace.c | 163 
 tools/xl/xl_parse.c |  22 +++
 xen/arch/x86/domain.c   |  19 +++
 xen/arch/x86/domctl.c   |  48 ++
 xen/arch/x86/hvm/vmx/vmcs.c |  15 +-
 xen/arch/x86/hvm/vmx/vmx.c  | 109 +
 xen/common/domain.c |  33 
 xen/common/memory.c |  77 -
 xen/include/asm-x86/cpufeature.h|   1 +
 xen/include/asm-x86/hvm/hvm.h   |  20 +++
 xen/include/asm-x86/hvm/vmx/vmcs.h  |   4 +
 xen/include/asm-x86/hvm/vmx/vmx.h   |  14 ++
 xen/include/asm-x86/msr-index.h |  24 +++
 xen/include/public/arch-x86/cpufeatureset.h |   1 +
 xen/include/public/domctl.h |  27 
 xen/include/public/memory.h |   1 +
 xen/include/xen/domain.h|   2 +
 xen/include/xen/sched.h |   8 +
 28 files changed, 768 insertions(+), 6 deletions(-)
 create mode 100644 tools/libxc/xc_vmtrace.c
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

-- 
2.17.1




Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature

2020-07-04 Thread Michał Leszczyński
- 3 lip 2020 o 9:58, Julien Grall jul...@xen.org napisał(a):

> Hi,
> 
> On 02/07/2020 21:28, Michał Leszczyński wrote:
>> - 2 lip 2020 o 16:31, Julien Grall jul...@xen.org napisał(a):
>> 
>>> On 02/07/2020 15:17, Jan Beulich wrote:
>>>> On 02.07.2020 16:14, Julien Grall wrote:
>>>>> On 02/07/2020 14:30, Jan Beulich wrote:
>>>>>> On 02.07.2020 11:57, Julien Grall wrote:
>>>>>>> On 02/07/2020 10:18, Jan Beulich wrote:
>>>>>>>> On 02.07.2020 10:54, Julien Grall wrote:
>>>>>>>>> On 02/07/2020 09:50, Jan Beulich wrote:
>>>>>>>>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>>>>>>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>>>>> Another way to do it, would be the toolstack to do the mapping. At which
>>>>>>> point, you still need an hypercall to do the mapping (probably the
>>>>>>> hypercall acquire).
>>>>>>
>>>>>> There may not be any mapping to do in such a contrived, fixed-range
>>>>>> environment. This scenario was specifically to demonstrate that the
>>>>>> way the mapping gets done may be arch-specific (here: a no-op)
>>>>>> despite the allocation not being so.
>>>>> You are arguing on extreme cases which I don't think is really helpful
>>>>> here. Yes if you want to map at a fixed address in a guest you may not
>>>>> need the acquire hypercall. But in most of the other cases (see has for
>>>>> the tools) you will need it.
>>>>>
>>>>> So what's the problem with requesting to have the acquire hypercall
>>>>> implemented in common code?
>>>>
>>>> Didn't we start out by you asking that there be as little common code
>>>> as possible for the time being?
>>>
>>> Well as I said I am not in favor of having the allocation in common
>>> code, but if you want to keep it then you also want to implement
>>> map/unmap in the common code ([1], [2]).
>>>
>>>> I have no issue with putting the
>>>> acquire implementation there ...
>>> This was definitely not clear given how you argued with extreme cases...
>>>
>>> Cheers,
>>>
>>> [1] <9a3f4d58-e5ad-c7a1-6c5f-42aa92101...@xen.org>
>>> [2] 
>>>
>>> --
>>> Julien Grall
>> 
>> 
>> Guys,
>> 
>> could you express your final decision on this topic?
> 
> Can you move the acquire implementation from x86 to common code?
> 
> Cheers,
> 
> --
> Julien Grall


Ok, sure. This will be done within the patch v5.

Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature

2020-07-02 Thread Michał Leszczyński
- 2 lip 2020 o 10:34, Jan Beulich jbeul...@suse.com napisał(a):

> On 02.07.2020 10:10, Roger Pau Monné wrote:
>> On Wed, Jul 01, 2020 at 10:42:55PM +0100, Andrew Cooper wrote:
>>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> index ca94c2bedc..b73d824357 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>>> @@ -291,6 +291,12 @@ static int vmx_init_vmcs_config(void)
>>>>  _vmx_cpu_based_exec_control &=
>>>>  ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
>>>>  
>>>> +rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>> +
>>>> +/* Check whether IPT is supported in VMX operation. */
>>>> +vmtrace_supported = cpu_has_ipt &&
>>>> +(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>>>
>>> There is a subtle corner case here.  vmx_init_vmcs_config() is called on
>>> all CPUs, and is supposed to level things down safely if we find any
>>> asymmetry.
>>>
>>> If instead you go with something like this:
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>> index b73d824357..6960109183 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -294,8 +294,8 @@ static int vmx_init_vmcs_config(void)
>>>  rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>>>  
>>>  /* Check whether IPT is supported in VMX operation. */
>>> -    vmtrace_supported = cpu_has_ipt &&
>>> -    (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>>> +    if ( !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
>>> +    vmtrace_supported = false;
>> 
>> This is also used during hotplug, so I'm not sure it's safe to turn
>> vmtrace_supported off during runtime, where VMs might be already using
>> it. IMO it would be easier to just set it on the BSP, and then refuse
>> to bring up any AP that doesn't have the feature.
> 
> +1
> 
> IOW I also don't think that "vmx_init_vmcs_config() ... is supposed to
> level things down safely". Instead I think the expectation is for
> CPU onlining to fail if a CPU lacks features compared to the BSP. As
> can be implied from what Roger says, doing like what you suggest may
> be fine during boot, but past that only at times where we know there's
> no user of a certain feature, and where discarding the feature flag
> won't lead to other inconsistencies (which may very well mean "never").
> 
> Jan


Ok, I will modify it in a way Roger suggested for the previous patch
version. CPU onlining will fail if there is an inconsistency.

Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v4 02/10] x86/vmx: add IPT cpu feature

2020-07-02 Thread Michał Leszczyński
- 2 lip 2020 o 16:31, Julien Grall jul...@xen.org napisał(a):

> On 02/07/2020 15:17, Jan Beulich wrote:
>> On 02.07.2020 16:14, Julien Grall wrote:
>>> On 02/07/2020 14:30, Jan Beulich wrote:
>>>> On 02.07.2020 11:57, Julien Grall wrote:
>>>>> On 02/07/2020 10:18, Jan Beulich wrote:
>>>>>> On 02.07.2020 10:54, Julien Grall wrote:
>>>>>>> On 02/07/2020 09:50, Jan Beulich wrote:
>>>>>>>> On 02.07.2020 10:42, Julien Grall wrote:
>>>>>>>>> On 02/07/2020 09:29, Jan Beulich wrote:
>>>>> Another way to do it, would be the toolstack to do the mapping. At which
>>>>> point, you still need an hypercall to do the mapping (probably the
>>>>> hypercall acquire).
>>>>
>>>> There may not be any mapping to do in such a contrived, fixed-range
>>>> environment. This scenario was specifically to demonstrate that the
>>>> way the mapping gets done may be arch-specific (here: a no-op)
>>>> despite the allocation not being so.
>>> You are arguing on extreme cases which I don't think is really helpful
>>> here. Yes if you want to map at a fixed address in a guest you may not
>>> need the acquire hypercall. But in most of the other cases (see has for
>>> the tools) you will need it.
>>>
>>> So what's the problem with requesting to have the acquire hypercall
>>> implemented in common code?
>> 
>> Didn't we start out by you asking that there be as little common code
>> as possible for the time being?
> 
> Well as I said I am not in favor of having the allocation in common
> code, but if you want to keep it then you also want to implement
> map/unmap in the common code ([1], [2]).
> 
>> I have no issue with putting the
>> acquire implementation there ...
> This was definitely not clear given how you argued with extreme cases...
> 
> Cheers,
> 
> [1] <9a3f4d58-e5ad-c7a1-6c5f-42aa92101...@xen.org>
> [2] 
> 
> --
> Julien Grall


Guys,

could you express your final decision on this topic?

While I understand the discussion and the arguments you've raised,
I would like to know what particular elements should be moved where.

So are we going abstract way, or non-abstract-x86 only way?

Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter

2020-07-02 Thread Michał Leszczyński
- 2 lip 2020 o 11:00, Roger Pau Monné roger@citrix.com napisał(a):

> On Tue, Jun 30, 2020 at 02:33:46PM +0200, Michał Leszczyński wrote:
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 59bdc28c89..7b8289d436 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>>  uint32_t max_evtchn_port;
>>  int32_t max_grant_frames;
>>  int32_t max_maptrack_frames;
>> +uint8_t vmtrace_pt_order;
> 
> I've been thinking about this, and even though this is a domctl (so
> not a stable interface) we might want to consider using a size (or a
> number of pages) here rather than an order. IPT also supports
> TOPA mode (kind of a linked list of buffers) that would allow for
> sizes not rounded to order boundaries to be used, since then only each
> item in the linked list needs to be rounded to an order boundary, so
> you could for example use three 4K pages in TOPA mode AFAICT.
> 
> Roger.

In previous versions it was "size" but it was requested to change it
to "order" in order to shrink the variable size from uint64_t to
uint8_t, because there is limited space for xen_domctl_createdomain
structure.

How should I proceed?

Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions

2020-06-30 Thread Michał Leszczyński
- 30 cze 2020 o 20:03, Tamas K Lengyel tamas.k.leng...@gmail.com napisał(a):

> On Tue, Jun 30, 2020 at 11:39 AM Andrew Cooper
>  wrote:
>>
>> On 30/06/2020 13:33, Michał Leszczyński wrote:
>> > diff --git a/xen/include/asm-x86/msr-index.h 
>> > b/xen/include/asm-x86/msr-index.h
>> > index b328a47ed8..0203029be9 100644
>> > --- a/xen/include/asm-x86/msr-index.h
>> > +++ b/xen/include/asm-x86/msr-index.h
>> > @@ -69,6 +69,43 @@
>> >  #define MSR_MCU_OPT_CTRL0x0123
>> >  #define  MCU_OPT_CTRL_RNGDS_MITG_DIS(_AC(1, ULL) <<  0)
>> >
>> > +/* Intel PT MSRs */
>> > +#define MSR_RTIT_OUTPUT_BASE0x0560
>> > +
>> > +#define MSR_RTIT_OUTPUT_MASK0x0561
>> > +
>> > +#define MSR_RTIT_CTL0x0570
>> > +#define  RTIT_CTL_TRACEEN(_AC(1, ULL) <<  0)
>> > +#define  RTIT_CTL_CYCEN  (_AC(1, ULL) <<  1)
>>
>> In addition to what Jan has said, please can we be consistent with an
>> underscore (or not) before EN.  Preferably with, so these would become
>> TRACE_EN and CYC_EN.
>>
>> That said, there are a lot of bit definitions which aren't used at all.
>> IMO, it would be better to introduce defines when you use them.
> 
> In the past I found it very valuable when this type of plumbing was
> already present in Xen instead of me having to go into the SDM to digg
> out the magic numbers. So while some of the bits might not be used
> right now I also don't see any downside in having them, just in case.
> 
> Tamas


+1 for keeping the unused #defines, this is a helpful piece of knowledge
which speeds up further patch development. It doesn't affect the compilation
nor runtime time and it doesn't occupy too much space in the code so I would
opt for keep it.

I will rebase this series onto latest master within patch v5. The remaining
patches in this series are not affected and still could be reviewed,
so I will wait a few days before posting the new version.


Best regards,
Michał Leszczyński
CERT Polska



[PATCH v4 06/10] memory: batch processing in acquire_resource()

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Allow to acquire large resources by allowing acquire_resource()
to process items in batches, using hypercall continuation.

Signed-off-by: Michal Leszczynski 
---
 xen/common/memory.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..3ab06581a2 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, 
unsigned int id,
 }
 
 static int acquire_resource(
-XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
+XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
+unsigned long *start_extent)
 {
 struct domain *d, *currd = current->domain;
 xen_mem_acquire_resource_t xmar;
+uint32_t total_frames;
 /*
  * The mfn_list and gfn_list (below) arrays are ok on stack for the
  * moment since they are small, but if they need to grow in future
@@ -1077,8 +1079,17 @@ static int acquire_resource(
 return 0;
 }
 
+total_frames = xmar.nr_frames;
+
+if ( *start_extent )
+{
+xmar.frame += *start_extent;
+xmar.nr_frames -= *start_extent;
+guest_handle_add_offset(xmar.frame_list, *start_extent);
+}
+
 if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
-return -E2BIG;
+xmar.nr_frames = ARRAY_SIZE(mfn_list);
 
 rc = rcu_lock_remote_domain_by_id(xmar.domid, );
 if ( rc )
@@ -1135,6 +1146,14 @@ static int acquire_resource(
 }
 }
 
+if ( !rc )
+{
+*start_extent += xmar.nr_frames;
+
+if ( *start_extent != total_frames )
+rc = -ERESTART;
+}
+
  out:
 rcu_unlock_domain(d);
 
@@ -1600,7 +1619,14 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 case XENMEM_acquire_resource:
 rc = acquire_resource(
-guest_handle_cast(arg, xen_mem_acquire_resource_t));
+guest_handle_cast(arg, xen_mem_acquire_resource_t),
+_extent);
+
+if ( rc == -ERESTART )
+return hypercall_create_continuation(
+__HYPERVISOR_memory_op, "lh",
+op | (start_extent << MEMOP_EXTENT_SHIFT), arg);
+
 break;
 
 default:
-- 
2.20.1




[PATCH v4 10/10] tools/proctrace: add proctrace tool

2020-06-30 Thread Michał Leszczyński
M IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+END OF TERMS AND CONDITIONS
+
+   How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+
+Copyright (C)   
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; If not, see <http://www.gnu.org/licenses/>.
+
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+Gnomovision version 69, Copyright (C) year name of author
+Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+This is free software, and you are welcome to redistribute it
+under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  , 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Library General
+Public License instead of this License.
diff --git a/tools/proctrace/Makefile b/tools/proctrace/Makefile
new file mode 100644
index 00..2983c477fe
--- /dev/null
+++ b/tools/proctrace/Makefile
@@ -0,0 +1,48 @@
+# Copyright (C) CERT Polska - NASK PIB
+# Author: Michał Leszczyński 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; under version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+XEN_ROOT=$(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+CFLAGS  += -Werror
+CFLAGS  += $(CFLAGS_libxenevtchn)
+CFLAGS  += $(CFLAGS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenevtchn)
+LDLIBS  += $(LDLIBS_libxen

[PATCH v4 07/10] x86/mm: add vmtrace_buf resource type

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Allow to map processor trace buffer using
acquire_resource().

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/mm.c   | 25 +
 xen/include/public/memory.h |  1 +
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e376fc7e8f..bb781bd90c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4624,6 +4624,31 @@ int arch_acquire_resource(struct domain *d, unsigned int 
type,
 }
 break;
 }
+
+case XENMEM_resource_vmtrace_buf:
+{
+mfn_t mfn;
+unsigned int i;
+struct vcpu *v = domain_vcpu(d, id);
+rc = -EINVAL;
+
+if ( !v )
+break;
+
+if ( !v->arch.vmtrace.pt_buf )
+break;
+
+mfn = page_to_mfn(v->arch.vmtrace.pt_buf);
+
+if ( frame + nr_frames > (v->domain->vmtrace_pt_size >> PAGE_SHIFT) )
+break;
+
+rc = 0;
+for ( i = 0; i < nr_frames; i++ )
+mfn_list[i] = mfn_x(mfn_add(mfn, frame + i));
+
+break;
+}
 #endif
 
 default:
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index dbd35305df..f823c784c3 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -620,6 +620,7 @@ struct xen_mem_acquire_resource {
 
 #define XENMEM_resource_ioreq_server 0
 #define XENMEM_resource_grant_table 1
+#define XENMEM_resource_vmtrace_buf 2
 
 /*
  * IN - a type-specific resource identifier, which must be zero
-- 
2.20.1




[PATCH v4 08/10] x86/domctl: add XEN_DOMCTL_vmtrace_op

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Implement domctl to manage the runtime state of
processor trace feature.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/domctl.c   | 48 +
 xen/include/public/domctl.h | 26 
 2 files changed, 74 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 6f2c69788d..a041b724d8 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -322,6 +322,48 @@ void arch_get_domain_info(const struct domain *d,
 info->arch_config.emulation_flags = d->arch.emulation_flags;
 }
 
+static int do_vmtrace_op(struct domain *d, struct xen_domctl_vmtrace_op *op,
+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+{
+int rc;
+struct vcpu *v;
+
+if ( !vmtrace_supported )
+return -EOPNOTSUPP;
+
+if ( !is_hvm_domain(d) )
+return -EOPNOTSUPP;
+
+if ( op->vcpu >= d->max_vcpus )
+return -EINVAL;
+
+v = domain_vcpu(d, op->vcpu);
+rc = 0;
+
+switch ( op->cmd )
+{
+case XEN_DOMCTL_vmtrace_pt_enable:
+case XEN_DOMCTL_vmtrace_pt_disable:
+vcpu_pause(v);
+spin_lock(>vmtrace_lock);
+
+rc = vmtrace_control_pt(v, op->cmd == XEN_DOMCTL_vmtrace_pt_enable);
+
+spin_unlock(>vmtrace_lock);
+vcpu_unpause(v);
+break;
+
+case XEN_DOMCTL_vmtrace_pt_get_offset:
+rc = vmtrace_get_pt_offset(v, >offset);
+break;
+
+default:
+rc = -EOPNOTSUPP;
+}
+
+return rc;
+}
+
 #define MAX_IOPORTS 0x1
 
 long arch_do_domctl(
@@ -337,6 +379,12 @@ long arch_do_domctl(
 switch ( domctl->cmd )
 {
 
+case XEN_DOMCTL_vmtrace_op:
+ret = do_vmtrace_op(d, >u.vmtrace_op, u_domctl);
+if ( !ret )
+copyback = true;
+   break;
+
 case XEN_DOMCTL_shadow_op:
 ret = paging_domctl(d, >u.shadow_op, u_domctl, 0);
 if ( ret == -ERESTART )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 7b8289d436..f836cb5970 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1136,6 +1136,28 @@ struct xen_domctl_vuart_op {
  */
 };
 
+/* XEN_DOMCTL_vmtrace_op: Perform VM tracing related operation */
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+
+struct xen_domctl_vmtrace_op {
+/* IN variable */
+uint32_t cmd;
+/* Enable/disable external vmtrace for given domain */
+#define XEN_DOMCTL_vmtrace_pt_enable  1
+#define XEN_DOMCTL_vmtrace_pt_disable 2
+#define XEN_DOMCTL_vmtrace_pt_get_offset  3
+domid_t domain;
+uint32_t vcpu;
+uint64_aligned_t size;
+
+/* OUT variable */
+uint64_aligned_t offset;
+};
+typedef struct xen_domctl_vmtrace_op xen_domctl_vmtrace_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_vmtrace_op_t);
+
+#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
+
 struct xen_domctl {
 uint32_t cmd;
 #define XEN_DOMCTL_createdomain   1
@@ -1217,6 +1239,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_vuart_op  81
 #define XEN_DOMCTL_get_cpu_policy82
 #define XEN_DOMCTL_set_cpu_policy83
+#define XEN_DOMCTL_vmtrace_op84
 #define XEN_DOMCTL_gdbsx_guestmemio1000
 #define XEN_DOMCTL_gdbsx_pausevcpu 1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu   1002
@@ -1277,6 +1300,9 @@ struct xen_domctl {
 struct xen_domctl_monitor_opmonitor_op;
 struct xen_domctl_psr_alloc psr_alloc;
 struct xen_domctl_vuart_op  vuart_op;
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+struct xen_domctl_vmtrace_opvmtrace_op;
+#endif
 uint8_t pad[128];
 } u;
 };
-- 
2.20.1




[PATCH v4 03/10] tools/libxl: add vmtrace_pt_size parameter

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Allow to specify the size of per-vCPU trace buffer upon
domain creation. This is zero by default (meaning: not enabled).

Signed-off-by: Michal Leszczynski 
---
 docs/man/xl.cfg.5.pod.in | 10 ++
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libxl/libxl.h  |  8 
 tools/libxl/libxl_create.c   |  1 +
 tools/libxl/libxl_types.idl  |  2 ++
 tools/xl/xl_parse.c  | 20 
 xen/common/domain.c  | 12 
 xen/include/public/domctl.h  |  1 +
 9 files changed, 57 insertions(+)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 0532739c1f..78f434b722 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -278,6 +278,16 @@ memory=8096 will report significantly less memory 
available for use
 than a system with maxmem=8096 memory=8096 due to the memory overhead
 of having to track the unused pages.
 
+=item B
+
+Specifies the size of processor trace buffer that would be allocated
+for each vCPU belonging to this domain. Disabled (i.e. B
+by default. This must be set to non-zero value in order to be able to
+use processor tracing features with this domain.
+
+B: The size value must be between 4 kB and 4 GB and it must
+be also a power of 2.
+
 =back
 
 =head3 Guest Virtual NUMA Configuration
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 935d3bc50a..ecace9634e 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1117,6 +1117,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.Altp2M = Altp2MMode(xc.altp2m)
+x.VmtracePtOrder = int(xc.vmtrace_pt_order)
 
  return nil}
 
@@ -1592,6 +1593,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.altp2m = C.libxl_altp2m_mode(x.Altp2M)
+xc.vmtrace_pt_order = C.int(x.VmtracePtOrder)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 663c1e86b4..f9b07ac862 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -516,6 +516,7 @@ GicVersion GicVersion
 Vuart VuartType
 }
 Altp2M Altp2MMode
+VmtracePtOrder int
 }
 
 type domainBuildInfoTypeUnion interface {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 71709dc585..891e8e28d6 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -438,6 +438,14 @@
  */
 #define LIBXL_HAVE_CREATEINFO_PASSTHROUGH 1
 
+/*
+ * LIBXL_HAVE_VMTRACE_PT_ORDER indicates that
+ * libxl_domain_create_info has a vmtrace_pt_order parameter, which
+ * allows to enable pre-allocation of processor tracing buffers
+ * with the given order of size.
+ */
+#define LIBXL_HAVE_VMTRACE_PT_ORDER 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 75862dc6ed..651d1f4c0f 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -608,6 +608,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config 
*d_config,
 .max_evtchn_port = b_info->event_channels,
 .max_grant_frames = b_info->max_grant_frames,
 .max_maptrack_frames = b_info->max_maptrack_frames,
+.vmtrace_pt_order = b_info->vmtrace_pt_order,
 };
 
 if (info->type != LIBXL_DOMAIN_TYPE_PV) {
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9d3f05f399..1c5dd43e4d 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -645,6 +645,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
 # supported by x86 HVM and ARM support is planned.
 ("altp2m", libxl_altp2m_mode),
 
+("vmtrace_pt_order", integer),
+
 ], dir=DIR_IN,
copy_deprecated_fn="libxl__domain_build_info_copy_deprecated",
 )
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 61b4ef7b7e..4eba224590 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1861,6 +1861,26 @@ void parse_config_data(const char *config_source,
 }
 }
 
+if (!xlu_cfg_get_long(config, "vmtrace_pt_size", , 1) && l) {
+int32_t shift = 0;
+
+if (l & (l - 1))
+{
+fprintf(stderr, "ERROR: pt buffer size must be a power of 2\n");
+exit(1);
+}
+
+while (l >>= 1) ++shift;
+
+if (shift <= XEN_PAGE_SHIFT)
+{
+fprintf(stderr, "ERROR: too small pt buffer\n");
+exit(1);
+}
+
+b_info->vmtrace_pt_order = shift - XEN_PAGE_SHIFT;
+}
+
 if (!xlu_cfg_get_list(config, "ioports", , _ioports, 0)) {
 b_info->num_ioports = num_ioports;
 

[PATCH v4 09/10] tools/libxc: add xc_vmtrace_* functions

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Add functions in libxc that use the new XEN_DOMCTL_vmtrace interface.

Signed-off-by: Michal Leszczynski 
---
 tools/libxc/Makefile  |  1 +
 tools/libxc/include/xenctrl.h | 39 +++
 tools/libxc/xc_vmtrace.c  | 73 +++
 3 files changed, 113 insertions(+)
 create mode 100644 tools/libxc/xc_vmtrace.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index fae5969a73..605e44501d 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -27,6 +27,7 @@ CTRL_SRCS-y   += xc_csched2.c
 CTRL_SRCS-y   += xc_arinc653.c
 CTRL_SRCS-y   += xc_rt.c
 CTRL_SRCS-y   += xc_tbuf.c
+CTRL_SRCS-y   += xc_vmtrace.c
 CTRL_SRCS-y   += xc_pm.c
 CTRL_SRCS-y   += xc_cpu_hotplug.c
 CTRL_SRCS-y   += xc_resume.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 113ddd935d..66966f6c17 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1585,6 +1585,45 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t 
mask);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
+/**
+ * Enable processor trace for given vCPU in given DomU.
+ * Allocate the trace ringbuffer with given size.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_enable(xc_interface *xch, uint32_t domid,
+ uint32_t vcpu);
+
+/**
+ * Disable processor trace for given vCPU in given DomU.
+ * Deallocate the trace ringbuffer.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_disable(xc_interface *xch, uint32_t domid,
+  uint32_t vcpu);
+
+/**
+ * Get current offset inside the trace ringbuffer.
+ * This allows to determine how much data was written into the buffer.
+ * Once buffer overflows, the offset will reset to 0 and the previous
+ * data will be overriden.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm offset current offset inside trace buffer will be written there
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_get_offset(xc_interface *xch, uint32_t domid,
+ uint32_t vcpu, uint64_t *offset);
+
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
diff --git a/tools/libxc/xc_vmtrace.c b/tools/libxc/xc_vmtrace.c
new file mode 100644
index 00..32f90a6203
--- /dev/null
+++ b/tools/libxc/xc_vmtrace.c
@@ -0,0 +1,73 @@
+/**
+ * xc_vmtrace.c
+ *
+ * API for manipulating hardware tracing features
+ *
+ * Copyright (c) 2020, Michal Leszczynski
+ *
+ * Copyright 2020 CERT Polska. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see .
+ */
+
+#include "xc_private.h"
+#include 
+
+int xc_vmtrace_pt_enable(
+xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+DECLARE_DOMCTL;
+int rc;
+
+domctl.cmd = XEN_DOMCTL_vmtrace_op;
+domctl.domain = domid;
+domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_enable;
+domctl.u.vmtrace_op.vcpu = vcpu;
+
+rc = do_domctl(xch, );
+return rc;
+}
+
+int xc_vmtrace_pt_get_offset(
+xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t *offset)
+{
+DECLARE_DOMCTL;
+int rc;
+
+domctl.cmd = XEN_DOMCTL_vmtrace_op;
+domctl.domain = domid;
+domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_get_offset;
+domctl.u.vmtrace_op.vcpu = vcpu;
+
+rc = do_domctl(xch, );
+if ( !rc )
+*offset = domctl.u.vmtrace_op.offset;
+return rc;
+}
+
+int xc_vmtrace_pt_disable(xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+DECLARE_DOMCTL;
+int rc;
+
+domctl.cmd = XEN_DOMCTL_vmtrace_op;
+domctl.domain = domid;
+domctl.u.vmtrace_op.cmd = XEN_DOMCTL_vmtrace_pt_disable;
+domctl.u.vmtrace_op.vcpu = vcpu;
+
+rc = do_domctl(xch, );
+return rc;
+}
+
-- 
2.20.1




[PATCH v4 05/10] common/domain: allocate vmtrace_pt_buffer

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Allocate processor trace buffer for each vCPU when the domain
is created, deallocate trace buffers on domain destruction.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/domain.c| 11 +++
 xen/common/domain.c  | 32 
 xen/include/asm-x86/domain.h |  4 
 xen/include/xen/sched.h  |  4 
 4 files changed, 51 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fee6c3931a..0d79fd390c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2199,6 +2199,17 @@ int domain_relinquish_resources(struct domain *d)
 altp2m_vcpu_disable_ve(v);
 }
 
+for_each_vcpu ( d, v )
+{
+if ( !v->arch.vmtrace.pt_buf )
+continue;
+
+vmtrace_destroy_pt(v);
+
+free_domheap_pages(v->arch.vmtrace.pt_buf,
+get_order_from_bytes(v->domain->vmtrace_pt_size));
+}
+
 if ( is_pv_domain(d) )
 {
 for_each_vcpu ( d, v )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 27dcfbac8c..8513659ef8 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -137,6 +137,31 @@ static void vcpu_destroy(struct vcpu *v)
 free_vcpu_struct(v);
 }
 
+static int vmtrace_alloc_buffers(struct vcpu *v)
+{
+struct page_info *pg;
+uint64_t size = v->domain->vmtrace_pt_size;
+
+if ( size < PAGE_SIZE || size > GB(4) || (size & (size - 1)) )
+{
+/*
+ * We don't accept trace buffer size smaller than single page
+ * and the upper bound is defined as 4GB in the specification.
+ * The buffer size must be also a power of 2.
+ */
+return -EINVAL;
+}
+
+pg = alloc_domheap_pages(v->domain, get_order_from_bytes(size),
+ MEMF_no_refcount);
+
+if ( !pg )
+return -ENOMEM;
+
+v->arch.vmtrace.pt_buf = pg;
+return 0;
+}
+
 struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
 {
 struct vcpu *v;
@@ -162,6 +187,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int 
vcpu_id)
 v->vcpu_id = vcpu_id;
 v->dirty_cpu = VCPU_CPU_CLEAN;
 
+if ( d->vmtrace_pt_size && vmtrace_alloc_buffers(v) != 0 )
+return NULL;
+
 spin_lock_init(>virq_lock);
 
 tasklet_init(>continue_hypercall_tasklet, NULL, NULL);
@@ -188,6 +216,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int 
vcpu_id)
 if ( arch_vcpu_create(v) != 0 )
 goto fail_sched;
 
+if ( d->vmtrace_pt_size && vmtrace_init_pt(v) != 0 )
+goto fail_sched;
+
 d->vcpu[vcpu_id] = v;
 if ( vcpu_id != 0 )
 {
@@ -422,6 +453,7 @@ struct domain *domain_create(domid_t domid,
 d->shutdown_code = SHUTDOWN_CODE_INVALID;
 
 spin_lock_init(>pbuf_lock);
+spin_lock_init(>vmtrace_lock);
 
 rwlock_init(>vnuma_rwlock);
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 6fd94c2e14..b01c107f5c 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -627,6 +627,10 @@ struct arch_vcpu
 struct {
 bool next_interrupt_enabled;
 } monitor;
+
+struct {
+struct page_info *pt_buf;
+} vmtrace;
 };
 
 struct guest_memory_policy
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..48f0a61bbd 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -457,6 +457,10 @@ struct domain
 unsignedpbuf_idx;
 spinlock_t  pbuf_lock;
 
+/* Used by vmtrace features */
+spinlock_t  vmtrace_lock;
+uint64_tvmtrace_pt_size;
+
 /* OProfile support. */
 struct xenoprof *xenoprof;
 
-- 
2.20.1




[PATCH v4 04/10] x86/vmx: implement processor tracing for VMX

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Use Intel Processor Trace feature in order to
provision vmtrace_pt_* features.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/vmx/vmx.c | 89 ++
 xen/include/asm-x86/hvm/hvm.h  | 38 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |  3 +
 xen/include/asm-x86/hvm/vmx/vmx.h  | 14 +
 4 files changed, 144 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ab19d9424e..db3f051b40 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -508,11 +508,24 @@ static void vmx_restore_host_msrs(void)
 
 static void vmx_save_guest_msrs(struct vcpu *v)
 {
+uint64_t rtit_ctl;
+
 /*
  * We cannot cache SHADOW_GS_BASE while the VCPU runs, as it can
  * be updated at any time via SWAPGS, which we cannot trap.
  */
 v->arch.hvm.vmx.shadow_gs = rdgsshadow();
+
+if ( unlikely(v->arch.hvm.vmx.pt_state &&
+  v->arch.hvm.vmx.pt_state->active) )
+{
+rdmsrl(MSR_RTIT_CTL, rtit_ctl);
+BUG_ON(rtit_ctl & RTIT_CTL_TRACEEN);
+
+rdmsrl(MSR_RTIT_STATUS, v->arch.hvm.vmx.pt_state->status);
+rdmsrl(MSR_RTIT_OUTPUT_MASK,
+   v->arch.hvm.vmx.pt_state->output_mask.raw);
+}
 }
 
 static void vmx_restore_guest_msrs(struct vcpu *v)
@@ -524,6 +537,17 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
 
 if ( cpu_has_msr_tsc_aux )
 wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
+
+if ( unlikely(v->arch.hvm.vmx.pt_state &&
+  v->arch.hvm.vmx.pt_state->active) )
+{
+wrmsrl(MSR_RTIT_OUTPUT_BASE,
+   v->arch.hvm.vmx.pt_state->output_base);
+wrmsrl(MSR_RTIT_OUTPUT_MASK,
+   v->arch.hvm.vmx.pt_state->output_mask.raw);
+wrmsrl(MSR_RTIT_STATUS,
+   v->arch.hvm.vmx.pt_state->status);
+}
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
@@ -2240,6 +2264,60 @@ static bool vmx_get_pending_event(struct vcpu *v, struct 
x86_event *info)
 return true;
 }
 
+static int vmx_init_pt(struct vcpu *v)
+{
+v->arch.hvm.vmx.pt_state = xzalloc(struct pt_state);
+
+if ( !v->arch.hvm.vmx.pt_state )
+return -EFAULT;
+
+if ( !v->arch.vmtrace.pt_buf )
+return -EINVAL;
+
+if ( !v->domain->vmtrace_pt_size )
+   return -EINVAL;
+
+v->arch.hvm.vmx.pt_state->output_base = 
page_to_maddr(v->arch.vmtrace.pt_buf);
+v->arch.hvm.vmx.pt_state->output_mask.raw = v->domain->vmtrace_pt_size - 1;
+
+if ( vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0) )
+return -EFAULT;
+
+if ( vmx_add_guest_msr(v, MSR_RTIT_CTL,
+  RTIT_CTL_TRACEEN | RTIT_CTL_OS |
+  RTIT_CTL_USR | RTIT_CTL_BRANCH_EN) )
+return -EFAULT;
+
+return 0;
+}
+
+static int vmx_destroy_pt(struct vcpu* v)
+{
+if ( v->arch.hvm.vmx.pt_state )
+xfree(v->arch.hvm.vmx.pt_state);
+
+v->arch.hvm.vmx.pt_state = NULL;
+return 0;
+}
+
+static int vmx_control_pt(struct vcpu *v, bool_t enable)
+{
+if ( !v->arch.hvm.vmx.pt_state )
+return -EINVAL;
+
+v->arch.hvm.vmx.pt_state->active = enable;
+return 0;
+}
+
+static int vmx_get_pt_offset(struct vcpu *v, uint64_t *offset)
+{
+if ( !v->arch.hvm.vmx.pt_state )
+return -EINVAL;
+
+*offset = v->arch.hvm.vmx.pt_state->output_mask.offset;
+return 0;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
 .name = "VMX",
 .cpu_up_prepare   = vmx_cpu_up_prepare,
@@ -2295,6 +2373,10 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 .altp2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
 .altp2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
 .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
+.vmtrace_init_pt = vmx_init_pt,
+.vmtrace_destroy_pt = vmx_destroy_pt,
+.vmtrace_control_pt = vmx_control_pt,
+.vmtrace_get_pt_offset = vmx_get_pt_offset,
 .tsc_scaling = {
 .max_ratio = VMX_TSC_MULTIPLIER_MAX,
 },
@@ -3674,6 +3756,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
 hvm_invalidate_regs_fields(regs);
 
+if ( unlikely(v->arch.hvm.vmx.pt_state &&
+  v->arch.hvm.vmx.pt_state->active) )
+{
+rdmsrl(MSR_RTIT_OUTPUT_MASK,
+   v->arch.hvm.vmx.pt_state->output_mask.raw);
+}
+
 if ( paging_mode_hap(v->domain) )
 {
 /*
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1eb377dd82..8f194889e5 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -214,6 +214,12 @@ struct hvm_function_table {
 bool_t (*altp2m_vcpu_emulate_ve)(struct vcpu *v);
 int (*altp2m_vcpu_emulate_vmfunc)(const struct cpu_user_regs *regs);
 
+/* vmtrace */
+int (*vmtrace_init_pt)(struct vcpu *v);
+int (*vmtrace_destroy_pt)(struct vcpu *v);
+int 

[PATCH v4 01/10] x86/vmx: add Intel PT MSR definitions

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Define constants related to Intel Processor Trace features.

Signed-off-by: Michal Leszczynski 
---
 xen/include/asm-x86/msr-index.h | 37 +
 1 file changed, 37 insertions(+)

diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index b328a47ed8..0203029be9 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -69,6 +69,43 @@
 #define MSR_MCU_OPT_CTRL0x0123
 #define  MCU_OPT_CTRL_RNGDS_MITG_DIS(_AC(1, ULL) <<  0)
 
+/* Intel PT MSRs */
+#define MSR_RTIT_OUTPUT_BASE0x0560
+
+#define MSR_RTIT_OUTPUT_MASK0x0561
+
+#define MSR_RTIT_CTL0x0570
+#define  RTIT_CTL_TRACEEN(_AC(1, ULL) <<  0)
+#define  RTIT_CTL_CYCEN  (_AC(1, ULL) <<  1)
+#define  RTIT_CTL_OS (_AC(1, ULL) <<  2)
+#define  RTIT_CTL_USR(_AC(1, ULL) <<  3)
+#define  RTIT_CTL_PWR_EVT_EN (_AC(1, ULL) <<  4)
+#define  RTIT_CTL_FUP_ON_PTW (_AC(1, ULL) <<  5)
+#define  RTIT_CTL_FABRIC_EN  (_AC(1, ULL) <<  6)
+#define  RTIT_CTL_CR3_FILTER (_AC(1, ULL) <<  7)
+#define  RTIT_CTL_TOPA   (_AC(1, ULL) <<  8)
+#define  RTIT_CTL_MTC_EN (_AC(1, ULL) <<  9)
+#define  RTIT_CTL_TSC_EN (_AC(1, ULL) <<  10)
+#define  RTIT_CTL_DIS_RETC   (_AC(1, ULL) <<  11)
+#define  RTIT_CTL_PTW_EN (_AC(1, ULL) <<  12)
+#define  RTIT_CTL_BRANCH_EN  (_AC(1, ULL) <<  13)
+#define  RTIT_CTL_MTC_FREQ   (_AC(0x0F, ULL) <<  14)
+#define  RTIT_CTL_CYC_THRESH (_AC(0x0F, ULL) <<  19)
+#define  RTIT_CTL_PSB_FREQ   (_AC(0x0F, ULL) <<  24)
+#define  RTIT_CTL_ADDR(n)(_AC(0x0F, ULL) <<  (32 + (4 * 
(n
+
+#define MSR_RTIT_STATUS 0x0571
+#define  RTIT_STATUS_FILTER_EN   (_AC(1, ULL) <<  0)
+#define  RTIT_STATUS_CONTEXT_EN  (_AC(1, ULL) <<  1)
+#define  RTIT_STATUS_TRIGGER_EN  (_AC(1, ULL) <<  2)
+#define  RTIT_STATUS_ERROR   (_AC(1, ULL) <<  4)
+#define  RTIT_STATUS_STOPPED (_AC(1, ULL) <<  5)
+#define  RTIT_STATUS_BYTECNT (_AC(0x1, ULL) <<  32)
+
+#define MSR_RTIT_CR3_MATCH  0x0572
+#define MSR_RTIT_ADDR_A(n)  (0x0580 + (n) * 2)
+#define MSR_RTIT_ADDR_B(n)  (0x0581 + (n) * 2)
+
 #define MSR_U_CET   0x06a0
 #define MSR_S_CET   0x06a2
 #define  CET_SHSTK_EN   (_AC(1, ULL) <<  0)
-- 
2.20.1




[PATCH v4 02/10] x86/vmx: add IPT cpu feature

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Check if Intel Processor Trace feature is supported by current
processor. Define vmtrace_supported global variable.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/vmx/vmcs.c | 7 ++-
 xen/common/domain.c | 2 ++
 xen/include/asm-x86/cpufeature.h| 1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  | 1 +
 xen/include/public/arch-x86/cpufeatureset.h | 1 +
 xen/include/xen/domain.h| 2 ++
 6 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index ca94c2bedc..b73d824357 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -291,6 +291,12 @@ static int vmx_init_vmcs_config(void)
 _vmx_cpu_based_exec_control &=
 ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
 
+rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
+
+/* Check whether IPT is supported in VMX operation. */
+vmtrace_supported = cpu_has_ipt &&
+(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
+
 if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
 {
 min = 0;
@@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
SECONDARY_EXEC_XSAVES |
SECONDARY_EXEC_TSC_SCALING);
-rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
 if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
 if ( opt_vpid_enabled )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7cc9526139..0a33e0dfd6 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
 
 vcpu_info_t dummy_vcpu_info;
 
+bool_t vmtrace_supported;
+
 static void __domain_finalise_shutdown(struct domain *d)
 {
 struct vcpu *v;
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f790d5c1f8..8d7955dd87 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -104,6 +104,7 @@
 #define cpu_has_clwbboot_cpu_has(X86_FEATURE_CLWB)
 #define cpu_has_avx512erboot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_avx512cdboot_cpu_has(X86_FEATURE_AVX512CD)
+#define cpu_has_ipt boot_cpu_has(X86_FEATURE_IPT)
 #define cpu_has_sha boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bwboot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vlboot_cpu_has(X86_FEATURE_AVX512VL)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 906810592f..0e9a0b8de6 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -283,6 +283,7 @@ extern u32 vmx_secondary_exec_control;
 #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x800ULL
 extern u64 vmx_ept_vpid_cap;
 
+#define VMX_MISC_PT_SUPPORTED   0x4000
 #define VMX_MISC_CR3_TARGET 0x01ff
 #define VMX_MISC_VMWRITE_ALL0x2000
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index 5ca35d9d97..0d3f15f628 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -217,6 +217,7 @@ XEN_CPUFEATURE(SMAP,  5*32+20) /*S  Supervisor Mode 
Access Prevention */
 XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add 
*/
 XEN_CPUFEATURE(CLFLUSHOPT,5*32+23) /*A  CLFLUSHOPT instruction */
 XEN_CPUFEATURE(CLWB,  5*32+24) /*A  CLWB instruction */
+XEN_CPUFEATURE(IPT,   5*32+25) /*   Intel Processor Trace */
 XEN_CPUFEATURE(AVX512PF,  5*32+26) /*A  AVX-512 Prefetch Instructions */
 XEN_CPUFEATURE(AVX512ER,  5*32+27) /*A  AVX-512 Exponent & Reciprocal 
Instrs */
 XEN_CPUFEATURE(AVX512CD,  5*32+28) /*A  AVX-512 Conflict Detection Instrs 
*/
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 7e51d361de..6c786a56c2 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -130,4 +130,6 @@ struct vnuma_info {
 
 void vnuma_destroy(struct vnuma_info *vnuma);
 
+extern bool_t vmtrace_supported;
+
 #endif /* __XEN_DOMAIN_H__ */
-- 
2.20.1




[PATCH v4 00/10] Implement support for external IPT monitoring

2020-06-30 Thread Michał Leszczyński
From: Michal Leszczynski 

Intel Processor Trace is an architectural extension available in modern Intel 
family CPUs. It allows recording the detailed trace of activity while the 
processor executes the code. One might use the recorded trace to reconstruct 
the code flow. It means, to find out the executed code paths, determine 
branches taken, and so forth.

The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures 
Software Developer's Manual Volume 3C: System Programming Guide, Part 3, 
Chapter 36: "Intel Processor Trace."

This patch series implements an interface that Dom0 could use in order to 
enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such 
a feature has numerous applications like malware monitoring, fuzzing, or 
performance testing.

Also thanks to Tamas K Lengyel for a few preliminary hints before
first version of this patch was submitted to xen-devel.

Changed since v1:
  * MSR_RTIT_CTL is managed using MSR load lists
  * other PT-related MSRs are modified only when vCPU goes out of context
  * trace buffer is now acquired as a resource
  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
must be specified in the moment of domain creation
  * trace buffers are allocated on domain creation, destructed on
domain destruction
  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
these calls don't manage buffer memory anymore
  * lifted 32 MFN/GFN array limit when acquiring resources
  * minor code style changes according to review

Changed since v2:
  * trace buffer is now allocated on domain creation (in v2 it was
allocated when hvm param was set)
  * restored 32-item limit in mfn/gfn arrays in acquire_resource
and instead implemented hypercall continuations
  * code changes according to Jan's and Roger's review

Changed since v3:
  * vmtrace HVMOPs are not implemented as DOMCTLs
  * patches splitted up according to Andrew's comments
  * code changes according to v3 review on the mailing list


Michal Leszczynski (10):
  x86/vmx: add Intel PT MSR definitions
  x86/vmx: add IPT cpu feature
  tools/libxl: add vmtrace_pt_size parameter
  x86/vmx: implement processor tracing for VMX
  common/domain: allocate vmtrace_pt_buffer
  memory: batch processing in acquire_resource()
  x86/mm: add vmtrace_buf resource type
  x86/domctl: add XEN_DOMCTL_vmtrace_op
  tools/libxc: add xc_vmtrace_* functions
  tools/proctrace: add proctrace tool

 docs/man/xl.cfg.5.pod.in|  10 +
 tools/golang/xenlight/helpers.gen.go|   2 +
 tools/golang/xenlight/types.gen.go  |   1 +
 tools/libxc/Makefile|   1 +
 tools/libxc/include/xenctrl.h   |  39 +++
 tools/libxc/xc_vmtrace.c|  73 +
 tools/libxl/libxl.h |   8 +
 tools/libxl/libxl_create.c  |   1 +
 tools/libxl/libxl_types.idl |   2 +
 tools/proctrace/COPYING | 339 
 tools/proctrace/Makefile|  48 +++
 tools/proctrace/proctrace.c | 163 ++
 tools/xl/xl_parse.c |  20 ++
 xen/arch/x86/domain.c   |  11 +
 xen/arch/x86/domctl.c   |  48 +++
 xen/arch/x86/hvm/vmx/vmcs.c |   7 +-
 xen/arch/x86/hvm/vmx/vmx.c  |  89 +
 xen/arch/x86/mm.c   |  25 ++
 xen/common/domain.c |  46 +++
 xen/common/memory.c |  32 +-
 xen/include/asm-x86/cpufeature.h|   1 +
 xen/include/asm-x86/domain.h|   4 +
 xen/include/asm-x86/hvm/hvm.h   |  38 +++
 xen/include/asm-x86/hvm/vmx/vmcs.h  |   4 +
 xen/include/asm-x86/hvm/vmx/vmx.h   |  14 +
 xen/include/asm-x86/msr-index.h |  37 +++
 xen/include/public/arch-x86/cpufeatureset.h |   1 +
 xen/include/public/domctl.h |  27 ++
 xen/include/public/memory.h |   1 +
 xen/include/xen/domain.h|   2 +
 xen/include/xen/sched.h |   4 +
 31 files changed, 1094 insertions(+), 4 deletions(-)
 create mode 100644 tools/libxc/xc_vmtrace.c
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

-- 
2.20.1




Re: [PATCH v3 4/7] x86/vmx: add do_vmtrace_op

2020-06-29 Thread Michał Leszczyński
- 23 cze 2020 o 13:54, Andrew Cooper andrew.coop...@citrix.com napisał(a):
> Overall, the moving parts of this series needs to split out into rather
> more patches.
> 
> First, in patch 3, the hvm_funcs.pt_supported isn't the place for that
> to live.  You want a global "bool vmtrace_supported" in common/domain.c
> which vmx_init_vmcs_config() sets, and the ARM code can set in the
> future when CoreSight is added.
> 
> Next, you want a patch in isolation which adds vmtrace_pt_size (or
> whatever it ends up being) to createdomain, where all
> allocation/deallocation logic lives in common/domain.c.  The spinlock
> (if its needed, but I don't think it is) wants initialising early in
> domain_create(), alongside d->pbuf_lock, and you also need an extra
> clause in sanitise_domain_config() which rejects a vmtrace setting if
> vmtrace isn't supported.  You'll need to put the struct page_info *
> pointer to the memory allocation in struct vcpu, and adjust the vcpu
> create/destroy logic appropriately.
> 
> Next, you want a patch doing the acquire resource logic for userspace to
> map the buffers.
> 
> Next, you want a patch to introduce a domctl with the various runtime
> enable/disable settings which were in an hvmop here.
> 
> Next, you want a patch to do the VMX plumbing, both at create, and runtime.
> 
> This ought to lay the logic out in a way which is extendable to x86 PV
> guests and ARM CoreSight, and oughtn't to explode when creating guests
> on non-Intel hardware.
> 
> Thanks,
> 
> ~Andrew


Thanks for your review, I'm almost done addressing all these remarks.
I've converted HVMOP to DOMCTL and splitted patches to smaller pieces.

I will send v4 soon.


Best regards,
Michał Leszczyński



Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-24 Thread Michał Leszczyński
- 23 cze 2020 o 19:24, Andrew Cooper andrew.coop...@citrix.com napisał(a):
> On 23/06/2020 09:51, Jan Beulich wrote:
>> I'd still like to see an explicit confirmation by him that this
>> use of memory is indeed what he has intended. There are much smaller
>> amounts of memory which we allocate on demand, just to avoid
>> allocating some without then ever using it.
> 
> PT is a debug/diagnostic tool.  Its not something you'd run in
> production against a production VM.
> 
> It's off by default (by virtue of having to explicitly ask to use it in
> the first place), and those who've asked for it don't want to be finding
> -ENOMEM after the domain has been running for a few seconds (or midway
> through the vcpus), when they inveterately want to map the rings.
> 
> Those who request buffers in the first place and forget about them are
> not semantically different from those who ask for a silly shadow memory
> limit, or typo the guest memory and give it too much.  Its a admin
> error, not a safety/correctness issue.
> 
> ~Andrew


Absolutely +1.

Assuming that somebody wants to perform some advanced scenario and is trying
to run many domains (e.g. 20), it's much better to have 19 domains
working fine and 1 prematurely crashing because of -ENOMEM,
rather than have all 20 domains randomly crashing in runtime because
it turned out there is a shortage of memory.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-22 Thread Michał Leszczyński
- 22 cze 2020 o 18:16, Jan Beulich jbeul...@suse.com napisał(a):

> On 22.06.2020 18:02, Michał Leszczyński wrote:
>> - 22 cze 2020 o 17:22, Jan Beulich jbeul...@suse.com napisał(a):
>>> On 22.06.2020 16:35, Michał Leszczyński wrote:
>>>> - 22 cze 2020 o 15:25, Jan Beulich jbeul...@suse.com napisał(a):
>>>>> Is any of what you do in this switch() actually legitimate without
>>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>>>> remarks elsewhere I imply you expect the param that you currently
>>>>> use to be set upon domain creation time, but at the very least the
>>>>> potentially big buffer should imo not get allocated up front, but
>>>>> only when tracing is to actually be enabled.
>>>>
>>>> Wait... so you want to allocate these buffers in runtime?
>>>> Previously we were talking that there is too much runtime logic
>>>> and these enable/disable hypercalls should be stripped to absolute
>>>> minimum.
>>>
>>> Basic arrangements can be made at domain creation time. I don't
>>> think though that it would be a good use of memory if you
>>> allocated perhaps many gigabytes of memory just for possibly
>>> wanting to enable tracing on a guest.
>>>
>> 
>> From our previous conversations I thought that you want to have
>> as much logic moved to the domain creation as possible.
>> 
>> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
>> zero (= disabled), if you set it to a non-zero value, then trace buffers
>> of given size will be allocated for the domain and you have possibility
>> to use ipt_enable/ipt_disable at any moment.
>> 
>> This way the runtime logic is as thin as possible. I assume user knows
>> in advance whether he/she would want to use external monitoring with IPT
>> or not.
> 
> Andrew - I think you requested movement to domain_create(). Could
> you clarify if indeed you mean to also allocate the big buffers
> this early?

I would like to recall what Andrew wrote few days ago:

- 16 cze 2020 o 22:16, Andrew Cooper andrew.coop...@citrix.com wrote:
> Xen has traditionally opted for a "and turn this extra thing on
> dynamically" model, but this has caused no end of security issues and
> broken corner cases.
> 
> You can see this still existing in the difference between
> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
> required to chose the number of vcpus for the domain) and we're making
> good progress undoing this particular wart (before 4.13, it was
> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
> issuing other hypercalls between these two).
> 
> There is a lot of settings which should be immutable for the lifetime of
> the domain, and external monitoring looks like another one of these.
> Specifying it at createdomain time allows for far better runtime
> behaviour (you are no longer in a situation where the first time you try
> to turn tracing on, you end up with -ENOMEM because another VM booted in
> the meantime and used the remaining memory), and it makes for rather
> more simple code in Xen itself (at runtime, you can rely on it having
> been set up properly, because a failure setting up will have killed the
> domain already).
> 
> ...
> 
> ~Andrew

according to this quote I've moved buffer allocation to the create_domain,
the updated version was already sent to the list as patch v3.

Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v3 0/7] Implement support for external IPT monitoring

2020-06-22 Thread Michał Leszczyński
- 22 cze 2020 o 20:06, Michał Leszczyński michal.leszczyn...@cert.pl 
napisał(a):

> This patch series implements an interface that Dom0 could use in order to
> enable IPT for particular vCPUs in DomU, allowing for external monitoring. 
> Such
> a feature has numerous applications like malware monitoring, fuzzing, or
> performance testing.


There is also a git branch with these patches:
https://github.com/icedevml/xen/tree/ipt-patch-v3b



[PATCH v3 7/7] tools/proctrace: add proctrace tool

2020-06-22 Thread Michał Leszczyński
ERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+END OF TERMS AND CONDITIONS
+
+   How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+
+Copyright (C)   
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; If not, see <http://www.gnu.org/licenses/>.
+
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+Gnomovision version 69, Copyright (C) year name of author
+Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+This is free software, and you are welcome to redistribute it
+under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  , 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Library General
+Public License instead of this License.
diff --git a/tools/proctrace/Makefile b/tools/proctrace/Makefile
new file mode 100644
index 00..76d7387a64
--- /dev/null
+++ b/tools/proctrace/Makefile
@@ -0,0 +1,50 @@
+# Copyright (C) CERT Polska - NASK PIB
+# Author: Michał Leszczyński 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; under version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+XEN_ROOT=$(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+CFLAGS  += -Werror
+CFLAGS  += $(CFLAGS_libxenevtchn)
+CFLAGS  += $(CFLAGS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenevtchn)
+LDLIBS  += $(LDLIBS_libxenforeignmemory)
+
+# SCRIP

[PATCH v3 6/7] tools/libxl: add vmtrace_pt_size parameter

2020-06-22 Thread Michał Leszczyński
Allow to specify the size of per-vCPU trace buffer upon
domain creation. This is zero by default (meaning: not enabled).

Signed-off-by: Michal Leszczynski 
---
 docs/man/xl.cfg.5.pod.in | 10 ++
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libxl/libxl_create.c   |  1 +
 tools/libxl/libxl_types.idl  |  2 ++
 tools/xl/xl_parse.c  |  4 
 6 files changed, 20 insertions(+)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 0532739c1f..78f434b722 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -278,6 +278,16 @@ memory=8096 will report significantly less memory 
available for use
 than a system with maxmem=8096 memory=8096 due to the memory overhead
 of having to track the unused pages.
 
+=item B
+
+Specifies the size of processor trace buffer that would be allocated
+for each vCPU belonging to this domain. Disabled (i.e. B
+by default. This must be set to non-zero value in order to be able to
+use processor tracing features with this domain.
+
+B: The size value must be between 4 kB and 4 GB and it must
+be also a power of 2.
+
 =back
 
 =head3 Guest Virtual NUMA Configuration
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 935d3bc50a..986ebbd681 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1117,6 +1117,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.Altp2M = Altp2MMode(xc.altp2m)
+x.VmtracePtSize = int(xc.vmtrace_pt_size)
 
  return nil}
 
@@ -1592,6 +1593,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.altp2m = C.libxl_altp2m_mode(x.Altp2M)
+xc.vmtrace_pt_size = C.int(x.VmtracePtSize)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 663c1e86b4..41ec7cdd32 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -516,6 +516,7 @@ GicVersion GicVersion
 Vuart VuartType
 }
 Altp2M Altp2MMode
+VmtracePtSize int
 }
 
 type domainBuildInfoTypeUnion interface {
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 75862dc6ed..32204b83b0 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -608,6 +608,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config 
*d_config,
 .max_evtchn_port = b_info->event_channels,
 .max_grant_frames = b_info->max_grant_frames,
 .max_maptrack_frames = b_info->max_maptrack_frames,
+.vmtrace_pt_size = b_info->vmtrace_pt_size,
 };
 
 if (info->type != LIBXL_DOMAIN_TYPE_PV) {
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9d3f05f399..04c1704b72 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -645,6 +645,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
 # supported by x86 HVM and ARM support is planned.
 ("altp2m", libxl_altp2m_mode),
 
+("vmtrace_pt_size", integer),
+
 ], dir=DIR_IN,
copy_deprecated_fn="libxl__domain_build_info_copy_deprecated",
 )
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 61b4ef7b7e..6ab98dda55 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1861,6 +1861,10 @@ void parse_config_data(const char *config_source,
 }
 }
 
+if (!xlu_cfg_get_long(config, "vmtrace_pt_size", , 1)) {
+b_info->vmtrace_pt_size = l;
+}
+
 if (!xlu_cfg_get_list(config, "ioports", , _ioports, 0)) {
 b_info->num_ioports = num_ioports;
 b_info->ioports = calloc(num_ioports, sizeof(*b_info->ioports));
-- 
2.20.1




[PATCH v3 5/7] tools/libxc: add xc_vmtrace_* functions

2020-06-22 Thread Michał Leszczyński
Add functions in libxc that use the new HVMOP_vmtrace interface.

Signed-off-by: Michal Leszczynski 
---
 tools/libxc/Makefile  |  1 +
 tools/libxc/include/xenctrl.h | 39 +++
 tools/libxc/xc_vmtrace.c  | 94 +++
 3 files changed, 134 insertions(+)
 create mode 100644 tools/libxc/xc_vmtrace.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index fae5969a73..605e44501d 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -27,6 +27,7 @@ CTRL_SRCS-y   += xc_csched2.c
 CTRL_SRCS-y   += xc_arinc653.c
 CTRL_SRCS-y   += xc_rt.c
 CTRL_SRCS-y   += xc_tbuf.c
+CTRL_SRCS-y   += xc_vmtrace.c
 CTRL_SRCS-y   += xc_pm.c
 CTRL_SRCS-y   += xc_cpu_hotplug.c
 CTRL_SRCS-y   += xc_resume.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 113ddd935d..66966f6c17 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1585,6 +1585,45 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t 
mask);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
+/**
+ * Enable processor trace for given vCPU in given DomU.
+ * Allocate the trace ringbuffer with given size.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_enable(xc_interface *xch, uint32_t domid,
+ uint32_t vcpu);
+
+/**
+ * Disable processor trace for given vCPU in given DomU.
+ * Deallocate the trace ringbuffer.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_disable(xc_interface *xch, uint32_t domid,
+  uint32_t vcpu);
+
+/**
+ * Get current offset inside the trace ringbuffer.
+ * This allows to determine how much data was written into the buffer.
+ * Once buffer overflows, the offset will reset to 0 and the previous
+ * data will be overriden.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm offset current offset inside trace buffer will be written there
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_pt_get_offset(xc_interface *xch, uint32_t domid,
+ uint32_t vcpu, uint64_t *offset);
+
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
diff --git a/tools/libxc/xc_vmtrace.c b/tools/libxc/xc_vmtrace.c
new file mode 100644
index 00..79aad2d9a8
--- /dev/null
+++ b/tools/libxc/xc_vmtrace.c
@@ -0,0 +1,94 @@
+/**
+ * xc_vmtrace.c
+ *
+ * API for manipulating hardware tracing features
+ *
+ * Copyright (c) 2020, Michal Leszczynski
+ *
+ * Copyright 2020 CERT Polska. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see .
+ */
+
+#include "xc_private.h"
+#include 
+
+int xc_vmtrace_pt_enable(
+xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+int rc = -1;
+
+arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+if ( arg == NULL )
+return -1;
+
+arg->cmd = HVMOP_vmtrace_pt_enable;
+arg->domain = domid;
+arg->vcpu = vcpu;
+
+rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+  HYPERCALL_BUFFER_AS_ARG(arg));
+
+xc_hypercall_buffer_free(xch, arg);
+return rc;
+}
+
+int xc_vmtrace_pt_get_offset(
+xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t *offset)
+{
+DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+int rc = -1;
+
+arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+if ( arg == NULL )
+return -1;
+
+arg->cmd = HVMOP_vmtrace_pt_get_offset;
+arg->domain = domid;
+arg->vcpu = vcpu;
+
+rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+  HYPERCALL_BUFFER_AS_ARG(arg));
+
+if ( rc == 0 )
+{
+*offset = arg->offset;
+}
+
+xc_hypercall_buffer_free(xch, arg);
+return rc;
+}
+
+int xc_vmtrace_pt_disable(xc_interface 

[PATCH v3 4/7] x86/vmx: add do_vmtrace_op

2020-06-22 Thread Michał Leszczyński
Provide an interface for privileged domains to manage
external IPT monitoring. Guest IPT state will be preserved
across vmentry/vmexit using ipt_state structure.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/hvm.c | 168 +
 xen/arch/x86/hvm/vmx/vmx.c |  31 ++
 xen/arch/x86/mm.c  |  28 +
 xen/common/domain.c|   3 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   3 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |  14 +++
 xen/include/public/domctl.h|   1 +
 xen/include/public/hvm/hvm_op.h|  26 +
 xen/include/public/hvm/params.h|   2 +-
 xen/include/public/memory.h|   1 +
 xen/include/xen/sched.h|   4 +
 xen/include/xlat.lst   |   1 +
 12 files changed, 281 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5bb47583b3..5899df52c3 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -606,6 +607,57 @@ static int hvm_print_line(
 return X86EMUL_OKAY;
 }
 
+static int vmtrace_alloc_buffers(struct vcpu *v, uint64_t size)
+{
+struct page_info *pg;
+struct pt_state *pt;
+
+if ( size < PAGE_SIZE || size > GB(4) || (size & (size - 1)) )
+{
+/*
+ * We don't accept trace buffer size smaller than single page
+ * and the upper bound is defined as 4GB in the specification.
+ * The buffer size must be also a power of 2.
+ */
+return -EINVAL;
+}
+
+if ( vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0) )
+return -EFAULT;
+
+pg = alloc_domheap_pages(v->domain, get_order_from_bytes(size),
+ MEMF_no_refcount);
+
+if ( !pg )
+return -ENOMEM;
+
+pt = xzalloc(struct pt_state);
+
+if ( !pt )
+return -ENOMEM;
+
+pt->output_base = page_to_maddr(pg);
+pt->output_mask.raw = size - 1;
+
+v->arch.hvm.vmx.pt_state = pt;
+
+return 0;
+}
+
+static void vmtrace_destroy_buffers(struct vcpu *v)
+{
+struct pt_state *pt = v->arch.hvm.vmx.pt_state;
+
+if ( pt )
+{
+free_domheap_pages(maddr_to_page(pt->output_base),
+   get_order_from_bytes(pt->output_mask.size + 1));
+
+xfree(pt);
+v->arch.hvm.vmx.pt_state = NULL;
+}
+}
+
 int hvm_domain_initialise(struct domain *d)
 {
 unsigned int nr_gsis;
@@ -747,7 +799,10 @@ void hvm_domain_relinquish_resources(struct domain *d)
 hpet_deinit(d);
 
 for_each_vcpu ( d, v )
+{
+vmtrace_destroy_buffers(v);
 hvmemul_cache_destroy(v);
+}
 }
 
 void hvm_domain_destroy(struct domain *d)
@@ -1594,6 +1649,13 @@ int hvm_vcpu_initialise(struct vcpu *v)
 hvm_set_guest_tsc(v, 0);
 }
 
+if ( d->vmtrace_pt_size )
+{
+rc = vmtrace_alloc_buffers(v, d->vmtrace_pt_size);
+if ( rc != 0 )
+goto fail1;
+}
+
 return 0;
 
  fail6:
@@ -4949,6 +5011,108 @@ static int compat_altp2m_op(
 return rc;
 }
 
+CHECK_hvm_vmtrace_op;
+
+static int do_vmtrace_op(XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+struct xen_hvm_vmtrace_op a;
+struct domain *d;
+int rc;
+struct vcpu *v;
+struct pt_state *pt;
+
+if ( !hvm_pt_supported() )
+return -EOPNOTSUPP;
+
+if ( copy_from_guest(, arg, 1) )
+return -EFAULT;
+
+if ( a.pad1 || a.pad2 )
+return -EINVAL;
+
+rc = rcu_lock_live_remote_domain_by_id(a.domain, );
+
+if ( rc )
+goto out;
+
+if ( !is_hvm_domain(d) )
+{
+rc = -EOPNOTSUPP;
+goto out;
+}
+
+if ( a.vcpu >= d->max_vcpus )
+{
+rc = -EINVAL;
+goto out;
+}
+
+v = domain_vcpu(d, a.vcpu);
+pt = v->arch.hvm.vmx.pt_state;
+
+if ( !pt )
+{
+/* PT must be first initialized upon domain creation. */
+rc = -EINVAL;
+goto out;
+}
+
+switch ( a.cmd )
+{
+case HVMOP_vmtrace_pt_enable:
+vcpu_pause(v);
+spin_lock(>vmtrace_lock);
+if ( vmx_add_guest_msr(v, MSR_RTIT_CTL,
+   RTIT_CTL_TRACEEN | RTIT_CTL_OS |
+   RTIT_CTL_USR | RTIT_CTL_BRANCH_EN) )
+{
+rc = -EFAULT;
+goto out;
+}
+
+pt->active = 1;
+spin_unlock(>vmtrace_lock);
+vcpu_unpause(v);
+break;
+
+case HVMOP_vmtrace_pt_disable:
+vcpu_pause(v);
+spin_lock(>vmtrace_lock);
+
+if ( vmx_del_msr(v, MSR_RTIT_CTL, VMX_MSR_GUEST) )
+{
+rc = -EFAULT;
+goto out;
+}
+
+pt->active = 0;
+spin_unlock(>vmtrace_lock);
+vcpu_unpause(v);
+break;
+
+case HVMOP_vmtrace_pt_get_offset:
+a.offset = pt->output_mask.offset;
+
+if ( __copy_field_to_guest(guest_handle_cast(arg, 
xen_hvm_vmtrace_op_t), , 

[PATCH v3 3/7] x86/vmx: add IPT cpu feature

2020-06-22 Thread Michał Leszczyński
Check if Intel Processor Trace feature is supported by current
processor. Define hvm_ipt_supported function.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/vmx/vmcs.c | 7 ++-
 xen/include/asm-x86/cpufeature.h| 1 +
 xen/include/asm-x86/hvm/hvm.h   | 9 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  | 1 +
 xen/include/public/arch-x86/cpufeatureset.h | 1 +
 5 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index ca94c2bedc..8c78c906b2 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -291,6 +291,12 @@ static int vmx_init_vmcs_config(void)
 _vmx_cpu_based_exec_control &=
 ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
 
+rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
+
+/* Check whether IPT is supported in VMX operation. */
+hvm_funcs.pt_supported = cpu_has_ipt &&
+ (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
+
 if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
 {
 min = 0;
@@ -305,7 +311,6 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
SECONDARY_EXEC_XSAVES |
SECONDARY_EXEC_TSC_SCALING);
-rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
 if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
 if ( opt_vpid_enabled )
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f790d5c1f8..8d7955dd87 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -104,6 +104,7 @@
 #define cpu_has_clwbboot_cpu_has(X86_FEATURE_CLWB)
 #define cpu_has_avx512erboot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_avx512cdboot_cpu_has(X86_FEATURE_AVX512CD)
+#define cpu_has_ipt boot_cpu_has(X86_FEATURE_IPT)
 #define cpu_has_sha boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bwboot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vlboot_cpu_has(X86_FEATURE_AVX512VL)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1eb377dd82..8c0d0ece67 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -96,6 +96,9 @@ struct hvm_function_table {
 /* Necessary hardware support for alternate p2m's? */
 bool altp2m_supported;
 
+/* Hardware support for processor tracing? */
+bool pt_supported;
+
 /* Hardware virtual interrupt delivery enable? */
 bool virtual_intr_delivery_enabled;
 
@@ -630,6 +633,12 @@ static inline bool hvm_altp2m_supported(void)
 return hvm_funcs.altp2m_supported;
 }
 
+/* returns true if hardware supports Intel Processor Trace */
+static inline bool hvm_pt_supported(void)
+{
+return hvm_funcs.pt_supported;
+}
+
 /* updates the current hardware p2m */
 static inline void altp2m_vcpu_update_p2m(struct vcpu *v)
 {
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 906810592f..0e9a0b8de6 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -283,6 +283,7 @@ extern u32 vmx_secondary_exec_control;
 #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x800ULL
 extern u64 vmx_ept_vpid_cap;
 
+#define VMX_MISC_PT_SUPPORTED   0x4000
 #define VMX_MISC_CR3_TARGET 0x01ff
 #define VMX_MISC_VMWRITE_ALL0x2000
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index 5ca35d9d97..0d3f15f628 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -217,6 +217,7 @@ XEN_CPUFEATURE(SMAP,  5*32+20) /*S  Supervisor Mode 
Access Prevention */
 XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add 
*/
 XEN_CPUFEATURE(CLFLUSHOPT,5*32+23) /*A  CLFLUSHOPT instruction */
 XEN_CPUFEATURE(CLWB,  5*32+24) /*A  CLWB instruction */
+XEN_CPUFEATURE(IPT,   5*32+25) /*   Intel Processor Trace */
 XEN_CPUFEATURE(AVX512PF,  5*32+26) /*A  AVX-512 Prefetch Instructions */
 XEN_CPUFEATURE(AVX512ER,  5*32+27) /*A  AVX-512 Exponent & Reciprocal 
Instrs */
 XEN_CPUFEATURE(AVX512CD,  5*32+28) /*A  AVX-512 Conflict Detection Instrs 
*/
-- 
2.20.1




[PATCH v3 2/7] x86/vmx: add Intel PT MSR definitions

2020-06-22 Thread Michał Leszczyński
Define constants related to Intel Processor Trace features.

Signed-off-by: Michal Leszczynski 
---
 xen/include/asm-x86/msr-index.h | 37 +
 1 file changed, 37 insertions(+)

diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index b328a47ed8..0203029be9 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -69,6 +69,43 @@
 #define MSR_MCU_OPT_CTRL0x0123
 #define  MCU_OPT_CTRL_RNGDS_MITG_DIS(_AC(1, ULL) <<  0)
 
+/* Intel PT MSRs */
+#define MSR_RTIT_OUTPUT_BASE0x0560
+
+#define MSR_RTIT_OUTPUT_MASK0x0561
+
+#define MSR_RTIT_CTL0x0570
+#define  RTIT_CTL_TRACEEN(_AC(1, ULL) <<  0)
+#define  RTIT_CTL_CYCEN  (_AC(1, ULL) <<  1)
+#define  RTIT_CTL_OS (_AC(1, ULL) <<  2)
+#define  RTIT_CTL_USR(_AC(1, ULL) <<  3)
+#define  RTIT_CTL_PWR_EVT_EN (_AC(1, ULL) <<  4)
+#define  RTIT_CTL_FUP_ON_PTW (_AC(1, ULL) <<  5)
+#define  RTIT_CTL_FABRIC_EN  (_AC(1, ULL) <<  6)
+#define  RTIT_CTL_CR3_FILTER (_AC(1, ULL) <<  7)
+#define  RTIT_CTL_TOPA   (_AC(1, ULL) <<  8)
+#define  RTIT_CTL_MTC_EN (_AC(1, ULL) <<  9)
+#define  RTIT_CTL_TSC_EN (_AC(1, ULL) <<  10)
+#define  RTIT_CTL_DIS_RETC   (_AC(1, ULL) <<  11)
+#define  RTIT_CTL_PTW_EN (_AC(1, ULL) <<  12)
+#define  RTIT_CTL_BRANCH_EN  (_AC(1, ULL) <<  13)
+#define  RTIT_CTL_MTC_FREQ   (_AC(0x0F, ULL) <<  14)
+#define  RTIT_CTL_CYC_THRESH (_AC(0x0F, ULL) <<  19)
+#define  RTIT_CTL_PSB_FREQ   (_AC(0x0F, ULL) <<  24)
+#define  RTIT_CTL_ADDR(n)(_AC(0x0F, ULL) <<  (32 + (4 * 
(n
+
+#define MSR_RTIT_STATUS 0x0571
+#define  RTIT_STATUS_FILTER_EN   (_AC(1, ULL) <<  0)
+#define  RTIT_STATUS_CONTEXT_EN  (_AC(1, ULL) <<  1)
+#define  RTIT_STATUS_TRIGGER_EN  (_AC(1, ULL) <<  2)
+#define  RTIT_STATUS_ERROR   (_AC(1, ULL) <<  4)
+#define  RTIT_STATUS_STOPPED (_AC(1, ULL) <<  5)
+#define  RTIT_STATUS_BYTECNT (_AC(0x1, ULL) <<  32)
+
+#define MSR_RTIT_CR3_MATCH  0x0572
+#define MSR_RTIT_ADDR_A(n)  (0x0580 + (n) * 2)
+#define MSR_RTIT_ADDR_B(n)  (0x0581 + (n) * 2)
+
 #define MSR_U_CET   0x06a0
 #define MSR_S_CET   0x06a2
 #define  CET_SHSTK_EN   (_AC(1, ULL) <<  0)
-- 
2.20.1




[PATCH v3 1/7] memory: batch processing in acquire_resource()

2020-06-22 Thread Michał Leszczyński
Allow to acquire large resources by allowing acquire_resource()
to process items in batches, using hypercall continuation.

Signed-off-by: Michal Leszczynski 
---
 xen/common/memory.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..3ab06581a2 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1046,10 +1046,12 @@ static int acquire_grant_table(struct domain *d, 
unsigned int id,
 }
 
 static int acquire_resource(
-XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
+XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
+unsigned long *start_extent)
 {
 struct domain *d, *currd = current->domain;
 xen_mem_acquire_resource_t xmar;
+uint32_t total_frames;
 /*
  * The mfn_list and gfn_list (below) arrays are ok on stack for the
  * moment since they are small, but if they need to grow in future
@@ -1077,8 +1079,17 @@ static int acquire_resource(
 return 0;
 }
 
+total_frames = xmar.nr_frames;
+
+if ( *start_extent )
+{
+xmar.frame += *start_extent;
+xmar.nr_frames -= *start_extent;
+guest_handle_add_offset(xmar.frame_list, *start_extent);
+}
+
 if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
-return -E2BIG;
+xmar.nr_frames = ARRAY_SIZE(mfn_list);
 
 rc = rcu_lock_remote_domain_by_id(xmar.domid, );
 if ( rc )
@@ -1135,6 +1146,14 @@ static int acquire_resource(
 }
 }
 
+if ( !rc )
+{
+*start_extent += xmar.nr_frames;
+
+if ( *start_extent != total_frames )
+rc = -ERESTART;
+}
+
  out:
 rcu_unlock_domain(d);
 
@@ -1600,7 +1619,14 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 case XENMEM_acquire_resource:
 rc = acquire_resource(
-guest_handle_cast(arg, xen_mem_acquire_resource_t));
+guest_handle_cast(arg, xen_mem_acquire_resource_t),
+_extent);
+
+if ( rc == -ERESTART )
+return hypercall_create_continuation(
+__HYPERVISOR_memory_op, "lh",
+op | (start_extent << MEMOP_EXTENT_SHIFT), arg);
+
 break;
 
 default:
-- 
2.20.1




[PATCH v3 0/7] Implement support for external IPT monitoring

2020-06-22 Thread Michał Leszczyński
Intel Processor Trace is an architectural extension available in modern Intel 
family CPUs. It allows recording the detailed trace of activity while the 
processor executes the code. One might use the recorded trace to reconstruct 
the code flow. It means, to find out the executed code paths, determine 
branches taken, and so forth.

The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures 
Software Developer's Manual Volume 3C: System Programming Guide, Part 3, 
Chapter 36: "Intel Processor Trace."

This patch series implements an interface that Dom0 could use in order to 
enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such 
a feature has numerous applications like malware monitoring, fuzzing, or 
performance testing.

Also thanks to Tamas K Lengyel for a few preliminary hints before
first version of this patch was submitted to xen-devel.

Changed since v1:
  * MSR_RTIT_CTL is managed using MSR load lists
  * other PT-related MSRs are modified only when vCPU goes out of context
  * trace buffer is now acquired as a resource
  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
must be specified in the moment of domain creation
  * trace buffers are allocated on domain creation, destructed on
domain destruction
  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
these calls don't manage buffer memory anymore
  * lifted 32 MFN/GFN array limit when acquiring resources
  * minor code style changes according to review

Changed since v2:
  * trace buffer is now allocated on domain creation (in v2 it was
allocated when hvm param was set)
  * restored 32-item limit in mfn/gfn arrays in acquire_resource
and instead implemented hypercall continuations
  * code changes according to Jan's and Roger's review

Michal Leszczynski (7):
  memory: batch processing in acquire_resource()
  x86/vmx: add Intel PT MSR definitions
  x86/vmx: add IPT cpu feature
  x86/vmx: add do_vmtrace_op
  tools/libxc: add xc_vmtrace_* functions
  tools/libxl: add vmtrace_pt_size parameter
  tools/proctrace: add proctrace tool

 docs/man/xl.cfg.5.pod.in|  10 +
 tools/golang/xenlight/helpers.gen.go|   2 +
 tools/golang/xenlight/types.gen.go  |   1 +
 tools/libxc/Makefile|   1 +
 tools/libxc/include/xenctrl.h   |  39 +++
 tools/libxc/xc_vmtrace.c|  94 ++
 tools/libxl/libxl_create.c  |   1 +
 tools/libxl/libxl_types.idl |   2 +
 tools/proctrace/COPYING | 339 
 tools/proctrace/Makefile|  50 +++
 tools/proctrace/proctrace.c | 158 +
 tools/xl/xl_parse.c |   4 +
 xen/arch/x86/hvm/hvm.c  | 168 ++
 xen/arch/x86/hvm/vmx/vmcs.c |   7 +-
 xen/arch/x86/hvm/vmx/vmx.c  |  31 ++
 xen/arch/x86/mm.c   |  28 ++
 xen/common/domain.c |   3 +
 xen/common/memory.c |  32 +-
 xen/include/asm-x86/cpufeature.h|   1 +
 xen/include/asm-x86/hvm/hvm.h   |   9 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |   4 +
 xen/include/asm-x86/hvm/vmx/vmx.h   |  14 +
 xen/include/asm-x86/msr-index.h |  37 +++
 xen/include/public/arch-x86/cpufeatureset.h |   1 +
 xen/include/public/domctl.h |   1 +
 xen/include/public/hvm/hvm_op.h |  26 ++
 xen/include/public/hvm/params.h |   2 +-
 xen/include/public/memory.h |   1 +
 xen/include/xen/sched.h |   4 +
 xen/include/xlat.lst|   1 +
 30 files changed, 1066 insertions(+), 5 deletions(-)
 create mode 100644 tools/libxc/xc_vmtrace.c
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

-- 
2.20.1




Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-22 Thread Michał Leszczyński
> +struct xen_hvm_vmtrace_op {
> +/* IN variable */
> +uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
> +uint32_t cmd;
> +/* Enable/disable external vmtrace for given domain */
> +#define HVMOP_vmtrace_ipt_enable  1
> +#define HVMOP_vmtrace_ipt_disable 2
> +#define HVMOP_vmtrace_ipt_get_offset  3
> +domid_t domain;
> +uint32_t vcpu;
> +uint64_t size;
> +
> +/* OUT variable */
> +uint64_t offset;

 If this is to be a tools-only interface, please use uint64_aligned_t.

>>> 
>>> This type is not defined within hvm_op.h header. What should I do about it?
>> 
>> It gets defined by xen.h, so should be available here. Its
>> definitions live in a
>> 
>> #if defined(__XEN__) || defined(__XEN_TOOLS__)
>> 
>> section, which is what I did recommend to put your interface in
>> as well. Unless you want this to be exposed to the guest itself,
>> at which point further constraints would arise.
>> 

When I've putted it into #if defined(__XEN__) || defined(__XEN_TOOLS__)
then it complains about uint64_aligned_compat_t type missing.

I also can't spot any single instance of uint64_aligned_t within
this file.


ml



Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-22 Thread Michał Leszczyński
- 22 cze 2020 o 18:25, Roger Pau Monné roger@citrix.com napisał(a):

> On Mon, Jun 22, 2020 at 06:16:57PM +0200, Jan Beulich wrote:
>> On 22.06.2020 18:02, Michał Leszczyński wrote:
>> > - 22 cze 2020 o 17:22, Jan Beulich jbeul...@suse.com napisał(a):
>> >> On 22.06.2020 16:35, Michał Leszczyński wrote:
>> >>> - 22 cze 2020 o 15:25, Jan Beulich jbeul...@suse.com napisał(a):
>> > It's also not "many gigabytes". In most use cases a buffer of 16/32/64 MB
>> > would suffice, I think.
>> 
>> But that one such buffer per vCPU, isn't it? Plus these buffers
>> need to be physically contiguous, which is an additional possibly
>> severe constraint.
> 
> FTR, from my reading of the SDM you can use a mode called ToPA where
> the buffer is some kind of linked list of tables that map to output
> regions. That would be nice, but IMO it should be implemented in a
> next iteration after the basic support is in.
> 
> Roger.

Yes. I keep that in mind but right now I would like to go for the
minimum viable implementation, while ToPA could be added in the next
patch series.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-22 Thread Michał Leszczyński
- 22 cze 2020 o 18:16, Jan Beulich jbeul...@suse.com napisał(a):

> On 22.06.2020 18:02, Michał Leszczyński wrote:
>> - 22 cze 2020 o 17:22, Jan Beulich jbeul...@suse.com napisał(a):
>>> On 22.06.2020 16:35, Michał Leszczyński wrote:
>>>> - 22 cze 2020 o 15:25, Jan Beulich jbeul...@suse.com napisał(a):
>>>>> Is any of what you do in this switch() actually legitimate without
>>>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>>>> remarks elsewhere I imply you expect the param that you currently
>>>>> use to be set upon domain creation time, but at the very least the
>>>>> potentially big buffer should imo not get allocated up front, but
>>>>> only when tracing is to actually be enabled.
>>>>
>>>> Wait... so you want to allocate these buffers in runtime?
>>>> Previously we were talking that there is too much runtime logic
>>>> and these enable/disable hypercalls should be stripped to absolute
>>>> minimum.
>>>
>>> Basic arrangements can be made at domain creation time. I don't
>>> think though that it would be a good use of memory if you
>>> allocated perhaps many gigabytes of memory just for possibly
>>> wanting to enable tracing on a guest.
>>>
>> 
>> From our previous conversations I thought that you want to have
>> as much logic moved to the domain creation as possible.
>> 
>> Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
>> zero (= disabled), if you set it to a non-zero value, then trace buffers
>> of given size will be allocated for the domain and you have possibility
>> to use ipt_enable/ipt_disable at any moment.
>> 
>> This way the runtime logic is as thin as possible. I assume user knows
>> in advance whether he/she would want to use external monitoring with IPT
>> or not.
> 
> Andrew - I think you requested movement to domain_create(). Could
> you clarify if indeed you mean to also allocate the big buffers
> this early?
> 
>> It's also not "many gigabytes". In most use cases a buffer of 16/32/64 MB
>> would suffice, I think.
> 
> But that one such buffer per vCPU, isn't it? Plus these buffers
> need to be physically contiguous, which is an additional possibly
> severe constraint.

Yes. For my use case (VMI stuff) I estimate 16-64 MB per vCPU and for fuzzing
I think it would be even less.

And also yes - these buffers need to be physically contigous and aligned
because otherwise CPU would refuse to use them.


Best regards,
Michał Leszczyński
CERT Polska




Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-22 Thread Michał Leszczyński
- 22 cze 2020 o 17:22, Jan Beulich jbeul...@suse.com napisał(a):

> On 22.06.2020 16:35, Michał Leszczyński wrote:
>> - 22 cze 2020 o 15:25, Jan Beulich jbeul...@suse.com napisał(a):
>>> On 19.06.2020 01:41, Michał Leszczyński wrote:
>>>> +
>>>> +domain_pause(d);
>>>
>>> Who's the intended caller of this interface? You making it a hvm-op
>>> suggests the guest may itself call this. But of course a guest
>>> can't pause itself. If this is supposed to be a tools-only interface,
>>> then you should frame it suitably in the public header and of course
>>> you need to enforce this here (which would e.g. mean you shouldn't
>>> use rcu_lock_domain_by_any_id()).
>>>
>> 
>> What should I use instead of rcu_lock_domain_by_and_id()?
> 
> Please take a look at the header where its declaration lives. It's
> admittedly not the usual thing in Xen, but there are even comments
> describing the differences between the four related by-id functions.
> I guess rcu_lock_live_remote_domain_by_id() is the one you want to
> use, despite being puzzled by there being surprisingly little uses
> elsewhere.
> 

Ok, I will correct this.

>>> Also please take a look at hvm/ioreq.c, which makes quite a bit of
>>> use of domain_pause(). In particular I think you want to acquire
>>> the lock only after having paused the domain.
>> 
>> This domain_pause() will be changed to vcpu_pause().
> 
> And you understand that my comment then still applies?

If you mean that we should first call vcpu_pause()
and then acquire spinlock, then yes, this will be corrected in v3.

>>> Is any of what you do in this switch() actually legitimate without
>>> hvm_set_vmtrace_pt_size() having got called for the guest? From
>>> remarks elsewhere I imply you expect the param that you currently
>>> use to be set upon domain creation time, but at the very least the
>>> potentially big buffer should imo not get allocated up front, but
>>> only when tracing is to actually be enabled.
>> 
>> Wait... so you want to allocate these buffers in runtime?
>> Previously we were talking that there is too much runtime logic
>> and these enable/disable hypercalls should be stripped to absolute
>> minimum.
> 
> Basic arrangements can be made at domain creation time. I don't
> think though that it would be a good use of memory if you
> allocated perhaps many gigabytes of memory just for possibly
> wanting to enable tracing on a guest.
> 

>From our previous conversations I thought that you want to have
as much logic moved to the domain creation as possible.

Thus, a parameter "vmtrace_pt_size" was introduced. By default it's
zero (= disabled), if you set it to a non-zero value, then trace buffers
of given size will be allocated for the domain and you have possibility
to use ipt_enable/ipt_disable at any moment.

This way the runtime logic is as thin as possible. I assume user knows
in advance whether he/she would want to use external monitoring with IPT
or not.

It's also not "many gigabytes". In most use cases a buffer of 16/32/64 MB
would suffice, I think.

If we want to fall back to the scenario where the trace buffer is
allocated dynamically, then we basically get back to patch v1
implementation.

>>>> +struct xen_hvm_vmtrace_op {
>>>> +/* IN variable */
>>>> +uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>>>> +uint32_t cmd;
>>>> +/* Enable/disable external vmtrace for given domain */
>>>> +#define HVMOP_vmtrace_ipt_enable  1
>>>> +#define HVMOP_vmtrace_ipt_disable 2
>>>> +#define HVMOP_vmtrace_ipt_get_offset  3
>>>> +domid_t domain;
>>>> +uint32_t vcpu;
>>>> +uint64_t size;
>>>> +
>>>> +/* OUT variable */
>>>> +uint64_t offset;
>>>
>>> If this is to be a tools-only interface, please use uint64_aligned_t.
>>>
>> 
>> This type is not defined within hvm_op.h header. What should I do about it?
> 
> It gets defined by xen.h, so should be available here. Its
> definitions live in a
> 
> #if defined(__XEN__) || defined(__XEN_TOOLS__)
> 
> section, which is what I did recommend to put your interface in
> as well. Unless you want this to be exposed to the guest itself,
> at which point further constraints would arise.
> 
>>> You also want to add an entry to xen/include/xlat.lst and use the
>>> resulting macro to prove that the struct layout is the same for
>>> native and compat callers.
>> 
>> Could you tell a little bit more about this?

Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-22 Thread Michał Leszczyński
- 22 cze 2020 o 15:25, Jan Beulich jbeul...@suse.com napisał(a):

> On 19.06.2020 01:41, Michał Leszczyński wrote:
>> +
>> +domain_pause(d);
> 
> Who's the intended caller of this interface? You making it a hvm-op
> suggests the guest may itself call this. But of course a guest
> can't pause itself. If this is supposed to be a tools-only interface,
> then you should frame it suitably in the public header and of course
> you need to enforce this here (which would e.g. mean you shouldn't
> use rcu_lock_domain_by_any_id()).
> 

What should I use instead of rcu_lock_domain_by_and_id()?

> Also please take a look at hvm/ioreq.c, which makes quite a bit of
> use of domain_pause(). In particular I think you want to acquire
> the lock only after having paused the domain.
> 

This domain_pause() will be changed to vcpu_pause().

> Shouldn't you rather remove the MSR from the load list here?
> 

This will be fixed.

> Is any of what you do in this switch() actually legitimate without
> hvm_set_vmtrace_pt_size() having got called for the guest? From
> remarks elsewhere I imply you expect the param that you currently
> use to be set upon domain creation time, but at the very least the
> potentially big buffer should imo not get allocated up front, but
> only when tracing is to actually be enabled.

Wait... so you want to allocate these buffers in runtime?
Previously we were talking that there is too much runtime logic
and these enable/disable hypercalls should be stripped to absolute
minimum.


>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -382,6 +382,29 @@ struct xen_hvm_altp2m_op {
>>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>>  
>> +/* HVMOP_vmtrace: Perform VM tracing related operation */
>> +#define HVMOP_vmtrace 26
>> +
>> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x0001
> 
> I'm unconvinced we want to introduce yet another versioned interface.
> In any event, as hinted at earlier, this suggests it wants to be a
> tools-only interface instead (which, at least for the time being, is
> not required to be a stable interface then, but that's also something
> we apparently want to move away from, and hence you may better not
> try to rely on it not needing to be stable).

Ok. I will remove the interface version.

> 
>> +struct xen_hvm_vmtrace_op {
>> +/* IN variable */
>> +uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>> +uint32_t cmd;
>> +/* Enable/disable external vmtrace for given domain */
>> +#define HVMOP_vmtrace_ipt_enable  1
>> +#define HVMOP_vmtrace_ipt_disable 2
>> +#define HVMOP_vmtrace_ipt_get_offset  3
>> +domid_t domain;
>> +uint32_t vcpu;
>> +uint64_t size;
>> +
>> +/* OUT variable */
>> +uint64_t offset;
> 
> If this is to be a tools-only interface, please use uint64_aligned_t.
> 

This type is not defined within hvm_op.h header. What should I do about it?

> You also want to add an entry to xen/include/xlat.lst and use the
> resulting macro to prove that the struct layout is the same for
> native and compat callers.

Could you tell a little bit more about this? What are "native" and
"compat" callers and what is the purpose of this file?


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays

2020-06-21 Thread Michał Leszczyński
- 19 cze 2020 o 14:39, Jan Beulich jbeul...@suse.com napisał(a):

> On 19.06.2020 14:35, Michał Leszczyński wrote:
>> - 19 cze 2020 o 13:34, Roger Pau Monné roger@citrix.com napisał(a):
>> 
>>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>>> Replace on-stack array allocation with heap allocation
>>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>>> when calling acquire_resource.
>>>
>>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>>> items to be processed like this, it's likely that you manage to
>>> trigger the watchdog if the list is long enough, specially when doing
>>> set_foreign_p2m_entry.
>>>
>>> You need to process the items in batches (32 was IMO a good start), and
>>> then add support for hypercall continuations. Take a look at how
>>> XENMEM_populate_physmap just a couple of lines below makes use of
>>> hypercall_create_continuation.
>>>
>>> After processing every batch you need to check if
>>> hypercall_preempt_check returns true and if so use
>>> hypercall_create_continuation in order to encode a continuation.
>> 
>> One more question. Are these continuations transparent from the caller side,
>> or do I also need to add something on the invoker side to properly handle 
>> these
>> continuations?
> 
> They are (mostly) transparent to the guest, yes. "Mostly" because we
> have cases (iirc) where the continuation data is stored in a way that
> a guest could observe it. But it still wouldn't need to do anything
> in order for the hypercall to get continued until it completes (which
> may be "fails", faod).
> 
> Jan


Okay, I've managed to implement continuations while still having these array 
small.
The operation could simply process max. 32 elements at the time and creates 
continuation
until everything gets processed.

This will be in patch v3.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-21 Thread Michał Leszczyński
- 19 cze 2020 o 1:41, Michał Leszczyński michal.leszczyn...@cert.pl 
napisał(a):

> Provide an interface for privileged domains to manage
> external IPT monitoring. Guest IPT state will be preserved
> across vmentry/vmexit using ipt_state structure.
> 
> Signed-off-by: Michal Leszczynski 
> ---
> xen/arch/x86/hvm/hvm.c | 167 +
> xen/arch/x86/hvm/vmx/vmx.c |  24 +
> xen/arch/x86/mm.c  |  37 +++
> xen/common/domain.c|   1 +
> xen/include/asm-x86/hvm/vmx/vmcs.h |  16 +++
> xen/include/public/hvm/hvm_op.h|  23 
> xen/include/public/hvm/params.h|   5 +-
> xen/include/public/memory.h|   1 +
> xen/include/xen/sched.h|   3 +
> 9 files changed, 276 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 5bb47583b3..145ad053d2 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1612,6 +1612,24 @@ int hvm_vcpu_initialise(struct vcpu *v)
> return rc;
> }
> 
> +void hvm_vmtrace_destroy(struct vcpu *v)
> +{
> +unsigned int i;
> +struct page_info *pg;
> +struct ipt_state *ipt = v->arch.hvm.vmx.ipt_state;
> +mfn_t buf_mfn = ipt->output_base >> PAGE_SHIFT;
> +size_t buf_size = ipt->output_mask.size + 1;
> +
> +xfree(ipt);
> +v->arch.hvm.vmx.ipt_state = NULL;
> +
> +for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
> +{
> +pg = mfn_to_page(_mfn(mfn_add(buf_mfn, i)));
> +free_domheap_page(pg);
> +}
> +}
> +
> void hvm_vcpu_destroy(struct vcpu *v)
> {
> viridian_vcpu_deinit(v);
> @@ -1631,6 +1649,8 @@ void hvm_vcpu_destroy(struct vcpu *v)
> vlapic_destroy(v);
> 
> hvm_vcpu_cacheattr_destroy(v);
> +
> +hvm_vmtrace_destroy(v);
> }
> 
> void hvm_vcpu_down(struct vcpu *v)
> @@ -4066,6 +4086,51 @@ static int hvmop_set_evtchn_upcall_vector(
> return 0;
> }
> 
> +static int hvm_set_vmtrace_pt_size(struct domain *d, uint64_t value)
> +{
> +void *buf;
> +unsigned int buf_order;
> +struct page_info *pg;
> +struct ipt_state *ipt;
> +struct vcpu *v;
> +
> +if ( value < PAGE_SIZE ||
> + value > GB(4) ||
> + ( value & (value - 1) ) ) {
> +/* we don't accept trace buffer size smaller than single page
> + * and the upper bound is defined as 4GB in the specification */
> +return -EINVAL;
> +}
> +
> +for_each_vcpu ( d, v )
> +{
> +buf_order = get_order_from_bytes(value);
> +pg = alloc_domheap_pages(d, buf_order, MEMF_no_refcount);
> +
> +if ( !pg )
> +return -EFAULT;
> +
> +buf = page_to_virt(pg);
> +
> +if ( vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0) )
> +return -EFAULT;
> +
> +ipt = xmalloc(struct ipt_state);
> +
> +if ( !ipt )
> +return -EFAULT;
> +
> +ipt->output_base = virt_to_mfn(buf) << PAGE_SHIFT;
> +ipt->output_mask.raw = value - 1;
> +ipt->status = 0;
> +ipt->active = 0;
> +
> +v->arch.hvm.vmx.ipt_state = ipt;
> +}
> +
> +return 0;
> +}
> +
> static int hvm_allow_set_param(struct domain *d,
>uint32_t index,
>uint64_t new_value)
> @@ -4127,6 +4192,7 @@ static int hvm_allow_set_param(struct domain *d,
> case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
> case HVM_PARAM_ALTP2M:
> case HVM_PARAM_MCA_CAP:
> +case HVM_PARAM_VMTRACE_PT_SIZE:
> if ( value != 0 && new_value != value )
> rc = -EEXIST;
> break;
> @@ -4328,6 +4394,9 @@ static int hvm_set_param(struct domain *d, uint32_t 
> index,
> uint64_t value)
> case HVM_PARAM_MCA_CAP:
> rc = vmce_enable_mca_cap(d, value);
> break;
> +case HVM_PARAM_VMTRACE_PT_SIZE:
> +rc = hvm_set_vmtrace_pt_size(d, value);
> +break;
> }
> 
> if ( !rc )
> @@ -4949,6 +5018,100 @@ static int compat_altp2m_op(
> return rc;
> }
> 
> +static int do_vmtrace_op(XEN_GUEST_HANDLE_PARAM(void) arg)
> +{
> +struct xen_hvm_vmtrace_op a;
> +struct domain *d;
> +int rc;
> +struct vcpu *v;
> +struct ipt_state *ipt;
> +
> +if ( !hvm_pt_supported() )
> +return -EOPNOTSUPP;
> +
> +if ( copy_from_guest(, arg, 1) )
> +return -EFAULT;
> +
> +if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
> +return -EINVAL;
> +
> +d = rcu_lock_domain_b

Re: [PATCH v2 3/7] x86/vmx: add IPT cpu feature

2020-06-21 Thread Michał Leszczyński
- 19 cze 2020 o 15:44, Roger Pau Monné roger@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:40:21AM +0200, Michał Leszczyński wrote:
>> Check if Intel Processor Trace feature is supported by current
>> processor. Define hvm_ipt_supported function.
>> 
>> Signed-off-by: Michal Leszczynski 
>> ---
> 
> We usually keep a shirt list of the changes between versions, so it's
> easier for the reviewers to know what changed. As an example:
> 
> https://lore.kernel.org/xen-devel/20200613184132.11880-1-jul...@xen.org/
> 
>>  xen/arch/x86/hvm/vmx/vmcs.c | 4 
>>  xen/include/asm-x86/cpufeature.h| 1 +
>>  xen/include/asm-x86/hvm/hvm.h   | 9 +
>>  xen/include/asm-x86/hvm/vmx/vmcs.h  | 1 +
>>  xen/include/public/arch-x86/cpufeatureset.h | 1 +
>>  5 files changed, 16 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index ca94c2bedc..8466ccb912 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -315,6 +315,10 @@ static int vmx_init_vmcs_config(void)
>>  if ( opt_ept_pml )
>>  opt |= SECONDARY_EXEC_ENABLE_PML;
>>  
>> +/* Check whether IPT is supported in VMX operation */
>> +hvm_funcs.pt_supported = cpu_has_ipt &&
>> +( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED );
> 
> By the placement of this chunk you are tying IPT support to the
> secondary exec availability, but I don't think that's required?
> 
> Ie: You should move the read of misc_cap to the top-level of the
> function and perform the VMX_MISC_PT_SUPPORTED check there also.
> 
> Note that space inside parentheses is only required for conditions of
> 'if', 'for' and those kind of statements, here it's not required, so
> this should be:
> 
>hvm_funcs.pt_supported = cpu_has_ipt &&
> (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
> 
> I also think this should look like:
> 
>if ( !smp_processor_id() )
>   hvm_funcs.pt_supported = cpu_has_ipt &&
> (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>else if ( hvm_funcs.pt_supported &&
>  !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
>{
>printk("VMX: IPT capabilities fatally differ between CPU%u and CPU0\n",
>   smp_processor_id());
>return -EINVAL;
>}
> 
> 
> So that you can detect mismatches between CPUs.


I'm afraid this snippet doesn't work. All CPUs read hvm_funcs.pt_supported as 0 
even when it was set to 1 for CPU=0. I'm not sure if this is some 
multithreading issue or there is a separate hvm_funcs for each CPU?

ml


> 
> Thanks, Roger.



Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-21 Thread Michał Leszczyński
- 19 cze 2020 o 17:50, Jan Beulich jbeul...@suse.com napisał(a):

> On 19.06.2020 17:30, Roger Pau Monné wrote:
>> On Fri, Jun 19, 2020 at 01:41:03AM +0200, Michał Leszczyński wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -1612,6 +1612,24 @@ int hvm_vcpu_initialise(struct vcpu *v)
>>>  return rc;
>>>  }
>>>  
>>> +void hvm_vmtrace_destroy(struct vcpu *v)
>>> +{
>>> +unsigned int i;
>>> +struct page_info *pg;
>>> +struct ipt_state *ipt = v->arch.hvm.vmx.ipt_state;
>>> +mfn_t buf_mfn = ipt->output_base >> PAGE_SHIFT;
>> 
>> Does this build? I think you are missing a _mfn(...) here?
> 
> This as well as ...
> 
>>> +size_t buf_size = ipt->output_mask.size + 1;
>>> +
>>> +xfree(ipt);
>>> +v->arch.hvm.vmx.ipt_state = NULL;
>>> +
>>> +for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
>>> +{
>>> +pg = mfn_to_page(_mfn(mfn_add(buf_mfn, i)));
> 
> ... the extra _mfn() here suggest the code was only ever built in
> release mode so far.
> 
> Jan


Ah, I forgot to enable developer checks. This will be corrected in v3.

ml



Re: [PATCH v2 3/7] x86/vmx: add IPT cpu feature

2020-06-19 Thread Michał Leszczyński
- 19 cze 2020 o 15:44, Roger Pau Monné roger@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:40:21AM +0200, Michał Leszczyński wrote:
>> Check if Intel Processor Trace feature is supported by current
>> processor. Define hvm_ipt_supported function.
>> 
>> Signed-off-by: Michal Leszczynski 
>> ---
> 
> We usually keep a shirt list of the changes between versions, so it's
> easier for the reviewers to know what changed. As an example:
> 
> https://lore.kernel.org/xen-devel/20200613184132.11880-1-jul...@xen.org/
> 

There is a change list in the cover letter. Should I also add changelog for
each individual patch?


>>  xen/arch/x86/hvm/vmx/vmcs.c | 4 
>>  xen/include/asm-x86/cpufeature.h| 1 +
>>  xen/include/asm-x86/hvm/hvm.h   | 9 +
>>  xen/include/asm-x86/hvm/vmx/vmcs.h  | 1 +
>>  xen/include/public/arch-x86/cpufeatureset.h | 1 +
>>  5 files changed, 16 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index ca94c2bedc..8466ccb912 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -315,6 +315,10 @@ static int vmx_init_vmcs_config(void)
>>  if ( opt_ept_pml )
>>  opt |= SECONDARY_EXEC_ENABLE_PML;
>>  
>> +/* Check whether IPT is supported in VMX operation */
>> +hvm_funcs.pt_supported = cpu_has_ipt &&
>> +( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED );
> 
> By the placement of this chunk you are tying IPT support to the
> secondary exec availability, but I don't think that's required?
> 
> Ie: You should move the read of misc_cap to the top-level of the
> function and perform the VMX_MISC_PT_SUPPORTED check there also.
> 
> Note that space inside parentheses is only required for conditions of
> 'if', 'for' and those kind of statements, here it's not required, so
> this should be:
> 
>hvm_funcs.pt_supported = cpu_has_ipt &&
> (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
> 
> I also think this should look like:
> 
>if ( !smp_processor_id() )
>   hvm_funcs.pt_supported = cpu_has_ipt &&
> (_vmx_misc_cap & VMX_MISC_PT_SUPPORTED);
>else if ( hvm_funcs.pt_supported &&
>  !(_vmx_misc_cap & VMX_MISC_PT_SUPPORTED) )
>{
>printk("VMX: IPT capabilities fatally differ between CPU%u and CPU0\n",
>   smp_processor_id());
>return -EINVAL;
>}
> 
> 
> So that you can detect mismatches between CPUs.


I will fix this.


> 
> Thanks, Roger.



Re: [PATCH for-4.14] x86/msr: Disallow access to Processor Trace MSRs

2020-06-19 Thread Michał Leszczyński
- 19 cze 2020 o 14:49, Jan Beulich jbeul...@suse.com napisał(a):

> On 19.06.2020 14:10, Michał Leszczyński wrote:
>> - 19 cze 2020 o 13:58, Andrew Cooper andrew.coop...@citrix.com 
>> napisał(a):
>> 
>>> We do not expose the feature to guests, so should disallow access to the
>>> respective MSRs.
>>>
>>> Signed-off-by: Andrew Cooper 
>>> ---
>>> CC: Jan Beulich 
>>> CC: Wei Liu 
>>> CC: Roger Pau Monné 
>>> CC: Paul Durrant 
>>> CC: Michał Leszczyński 
>>>
>>> Paul: For 4.14.  This needs backporting to older trees as well.
>>>
>>> Michał: CC'ing, just to keep you in the loop.  Xen has some dubious default
>>> MSR semantics which we're still in the middle of untangling in a backwards
>>> compatible way.  Patches like this will eventually not be necessary, but 
>>> they
>>> are for now.
>> 
>> 
>> As for external IPT monitoring, it would be best if the VM would think
>> that IPT is simply not supported at all by the underlying hypervisor.
> 
> This is already the case, isn't it? Yet not reporting a feature may
> not keep a guest from trying to access the respective MSRs.
> 
> Jan


Okay, understood :)

ml



Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays

2020-06-19 Thread Michał Leszczyński
- 19 cze 2020 o 13:34, Roger Pau Monné roger@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>> Replace on-stack array allocation with heap allocation
>> in order to lift the limit of 32 items in mfn/gfn arrays
>> when calling acquire_resource.
> 
> I'm afraid this is not correct, you cannot allow unbounded amounts of
> items to be processed like this, it's likely that you manage to
> trigger the watchdog if the list is long enough, specially when doing
> set_foreign_p2m_entry.
> 
> You need to process the items in batches (32 was IMO a good start), and
> then add support for hypercall continuations. Take a look at how
> XENMEM_populate_physmap just a couple of lines below makes use of
> hypercall_create_continuation.
> 
> After processing every batch you need to check if
> hypercall_preempt_check returns true and if so use
> hypercall_create_continuation in order to encode a continuation.
> 
> Thanks, Roger.


One more question. Are these continuations transparent from the caller side,
or do I also need to add something on the invoker side to properly handle these
continuations?


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH for-4.14] x86/msr: Disallow access to Processor Trace MSRs

2020-06-19 Thread Michał Leszczyński
- 19 cze 2020 o 13:58, Andrew Cooper andrew.coop...@citrix.com napisał(a):

> We do not expose the feature to guests, so should disallow access to the
> respective MSRs.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Paul Durrant 
> CC: Michał Leszczyński 
> 
> Paul: For 4.14.  This needs backporting to older trees as well.
> 
> Michał: CC'ing, just to keep you in the loop.  Xen has some dubious default
> MSR semantics which we're still in the middle of untangling in a backwards
> compatible way.  Patches like this will eventually not be necessary, but they
> are for now.


As for external IPT monitoring, it would be best if the VM would think
that IPT is simply not supported at all by the underlying hypervisor.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays

2020-06-19 Thread Michał Leszczyński
- 19 cze 2020 o 13:48, Jan Beulich jbeul...@suse.com napisał(a):

> On 19.06.2020 13:36, Michał Leszczyński wrote:
>> - 19 cze 2020 o 13:34, Roger Pau Monné roger@citrix.com napisał(a):
>> 
>>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>>> Replace on-stack array allocation with heap allocation
>>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>>> when calling acquire_resource.
>>>
>>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>>> items to be processed like this, it's likely that you manage to
>>> trigger the watchdog if the list is long enough, specially when doing
>>> set_foreign_p2m_entry.
>>>
>>> You need to process the items in batches (32 was IMO a good start), and
>>> then add support for hypercall continuations. Take a look at how
>>> XENMEM_populate_physmap just a couple of lines below makes use of
>>> hypercall_create_continuation.
>>>
>>> After processing every batch you need to check if
>>> hypercall_preempt_check returns true and if so use
>>> hypercall_create_continuation in order to encode a continuation.
>>>
>>> Thanks, Roger.
>> 
>> 
>> Somebody previously suggested that this limit could be lifted this way,
>> so I would like to hear some more opinions on that.
> 
> I did suggest the limit can be lifted, but not by processing all
> pieces in one go. Whether batches of 32 or 64 or 128 are chosen
> is a different thing, but you can't do arbitrary amounts without
> any preemption checks.
> 
> Jan


Okay. I will try to correct it within v3.

Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays

2020-06-19 Thread Michał Leszczyński
- 19 cze 2020 o 13:34, Roger Pau Monné roger@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>> Replace on-stack array allocation with heap allocation
>> in order to lift the limit of 32 items in mfn/gfn arrays
>> when calling acquire_resource.
> 
> I'm afraid this is not correct, you cannot allow unbounded amounts of
> items to be processed like this, it's likely that you manage to
> trigger the watchdog if the list is long enough, specially when doing
> set_foreign_p2m_entry.
> 
> You need to process the items in batches (32 was IMO a good start), and
> then add support for hypercall continuations. Take a look at how
> XENMEM_populate_physmap just a couple of lines below makes use of
> hypercall_create_continuation.
> 
> After processing every batch you need to check if
> hypercall_preempt_check returns true and if so use
> hypercall_create_continuation in order to encode a continuation.
> 
> Thanks, Roger.


Somebody previously suggested that this limit could be lifted this way,
so I would like to hear some more opinions on that.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-18 Thread Michał Leszczyński
- 19 cze 2020 o 1:41, Michał Leszczyński michal.leszczyn...@cert.pl 
napisał(a):

> Provide an interface for privileged domains to manage
> external IPT monitoring. Guest IPT state will be preserved
> across vmentry/vmexit using ipt_state structure.
> 
> Signed-off-by: Michal Leszczynski 
> ---

...

> +struct xen_hvm_vmtrace_op {
> +/* IN variable */
> +uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
> +uint32_t cmd;
> +/* Enable/disable external vmtrace for given domain */
> +#define HVMOP_vmtrace_ipt_enable  1
> +#define HVMOP_vmtrace_ipt_disable 2
> +#define HVMOP_vmtrace_ipt_get_offset  3
> +domid_t domain;
> +uint32_t vcpu;
> +uint64_t size;
> +
> +/* OUT variable */
> +uint64_t offset;
> +};
> +typedef struct xen_hvm_vmtrace_op xen_hvm_vmtrace_op_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_vmtrace_op_t);
> +

I've forgotten about the padding thing here. This will be fixed in the next 
patch version, sorry.

ml



Re: [PATCH v2 0/7] Implement support for external IPT monitoring

2020-06-18 Thread Michał Leszczyński
- 19 cze 2020 o 1:34, Michał Leszczyński michal.leszczyn...@cert.pl 
napisał(a):

> Intel Processor Trace is an architectural extension available in modern Intel
> family CPUs. It allows recording the detailed trace of activity while the
> processor executes the code. One might use the recorded trace to reconstruct
> the code flow. It means, to find out the executed code paths, determine
> branches taken, and so forth.
> 
> The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures
> Software Developer's Manual Volume 3C: System Programming Guide, Part 3,
> Chapter 36: "Intel Processor Trace."
> 
> This patch series implements an interface that Dom0 could use in order to
> enable IPT for particular vCPUs in DomU, allowing for external monitoring. 
> Such
> a feature has numerous applications like malware monitoring, fuzzing, or
> performance testing.
> 
> Also thanks to Tamas K Lengyel for a few preliminary hints before
> first version of this patch was submitted to xen-devel.
> 
> Changed since v1:
>  * MSR_RTIT_CTL is managed using MSR load lists
>  * other PT-related MSRs are modified only when vCPU goes out of context
>  * trace buffer is now acquired as a resource
>  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
>must be specified in the moment of domain creation
>  * trace buffers are allocated on domain creation, destructed on
>domain destruction
>  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
>these calls don't manage buffer memory anymore
>  * lifted 32 MFN/GFN array limit when acquiring resources
>  * minor code style changes according to review
> 
> Michal Leszczynski (7):
>  xen/mm: lift 32 item limit from mfn/gfn arrays
>  x86/vmx: add Intel PT MSR definitions
>  x86/vmx: add IPT cpu feature
>  x86/vmx: add do_vmtrace_op
>  tools/libxc: add xc_vmtrace_* functions
>  tools/libxl: add vmtrace_pt_size parameter
>  tools/proctrace: add proctrace tool
> 
> tools/golang/xenlight/helpers.gen.go|   2 +
> tools/golang/xenlight/types.gen.go  |   1 +
> tools/libxc/Makefile|   1 +
> tools/libxc/include/xenctrl.h   |  39 +++
> tools/libxc/xc_vmtrace.c|  97 ++
> tools/libxl/libxl_types.idl |   2 +
> tools/libxl/libxl_x86.c |   5 +
> tools/proctrace/COPYING | 339 
> tools/proctrace/Makefile|  50 +++
> tools/proctrace/proctrace.c | 153 +
> tools/xl/xl_parse.c |   4 +
> xen/arch/x86/hvm/hvm.c  | 167 ++
> xen/arch/x86/hvm/vmx/vmcs.c |   4 +
> xen/arch/x86/hvm/vmx/vmx.c  |  24 ++
> xen/arch/x86/mm.c   |  37 +++
> xen/common/domain.c |   1 +
> xen/common/memory.c |  39 +--
> xen/include/asm-x86/cpufeature.h|   1 +
> xen/include/asm-x86/hvm/hvm.h   |   9 +
> xen/include/asm-x86/hvm/vmx/vmcs.h  |  17 +
> xen/include/asm-x86/msr-index.h |  37 +++
> xen/include/public/arch-x86/cpufeatureset.h |   1 +
> xen/include/public/hvm/hvm_op.h |  23 ++
> xen/include/public/hvm/params.h |   5 +-
> xen/include/public/memory.h |   1 +
> xen/include/xen/sched.h |   3 +
> 26 files changed, 1039 insertions(+), 23 deletions(-)
> create mode 100644 tools/libxc/xc_vmtrace.c
> create mode 100644 tools/proctrace/COPYING
> create mode 100644 tools/proctrace/Makefile
> create mode 100644 tools/proctrace/proctrace.c
> 
> --
> 2.20.1


Thanks for all comments related to v1. I did my best to address all of them and
thus almost all code was altered. Due to that, I've decided to post the next
version at this stage.



[PATCH v2 7/7] tools/proctrace: add proctrace tool

2020-06-18 Thread Michał Leszczyński
Add an demonstration tool that uses xc_vmtrace_* calls in order
to manage external IPT monitoring for DomU.

Signed-off-by: Michal Leszczynski 
---
 tools/proctrace/COPYING | 339 
 tools/proctrace/Makefile|  50 ++
 tools/proctrace/proctrace.c | 153 
 3 files changed, 542 insertions(+)
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

diff --git a/tools/proctrace/COPYING b/tools/proctrace/COPYING
new file mode 100644
index 00..c0a841112c
--- /dev/null
+++ b/tools/proctrace/COPYING
@@ -0,0 +1,339 @@
+   GNU GENERAL PUBLIC LICENSE
+  Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+   59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+   Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+   GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and 

[PATCH v2 5/7] tools/libxc: add xc_vmtrace_* functions

2020-06-18 Thread Michał Leszczyński
Add functions in libxc that use the new HVMOP_vmtrace interface.

Signed-off-by: Michal Leszczynski 
---
 tools/libxc/Makefile  |  1 +
 tools/libxc/include/xenctrl.h | 39 ++
 tools/libxc/xc_vmtrace.c  | 97 +++
 3 files changed, 137 insertions(+)
 create mode 100644 tools/libxc/xc_vmtrace.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index fae5969a73..605e44501d 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -27,6 +27,7 @@ CTRL_SRCS-y   += xc_csched2.c
 CTRL_SRCS-y   += xc_arinc653.c
 CTRL_SRCS-y   += xc_rt.c
 CTRL_SRCS-y   += xc_tbuf.c
+CTRL_SRCS-y   += xc_vmtrace.c
 CTRL_SRCS-y   += xc_pm.c
 CTRL_SRCS-y   += xc_cpu_hotplug.c
 CTRL_SRCS-y   += xc_resume.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 113ddd935d..101cc9b712 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1585,6 +1585,45 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t 
mask);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
+/**
+ * Enable Intel Processor Trace for given vCPU in given DomU.
+ * Allocate the trace ringbuffer with given size.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_ipt_enable(xc_interface *xch, uint32_t domid,
+  uint32_t vcpu);
+
+/**
+ * Disable Intel Processor Trace for given vCPU in given DomU.
+ * Deallocate the trace ringbuffer.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_ipt_disable(xc_interface *xch, uint32_t domid,
+   uint32_t vcpu);
+
+/**
+ * Get current offset inside the trace ringbuffer.
+ * This allows to determine how much data was written into the buffer.
+ * Once buffer overflows, the offset will reset to 0 and the previous
+ * data will be overriden.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm offset current offset inside trace buffer will be written there
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_ipt_get_offset(xc_interface *xch, uint32_t domid,
+  uint32_t vcpu, uint64_t *offset);
+
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
diff --git a/tools/libxc/xc_vmtrace.c b/tools/libxc/xc_vmtrace.c
new file mode 100644
index 00..5f0551ad71
--- /dev/null
+++ b/tools/libxc/xc_vmtrace.c
@@ -0,0 +1,97 @@
+/**
+ * xc_vmtrace.c
+ *
+ * API for manipulating hardware tracing features
+ *
+ * Copyright (c) 2020, Michal Leszczynski
+ *
+ * Copyright 2020 CERT Polska. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see .
+ */
+
+#include "xc_private.h"
+#include 
+
+int xc_vmtrace_ipt_enable(
+xc_interface *xch, uint32_t domid, uint32_t vcpu)
+{
+DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+int rc = -1;
+
+arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+if ( arg == NULL )
+return -1;
+
+arg->version = HVMOP_VMTRACE_INTERFACE_VERSION;
+arg->cmd = HVMOP_vmtrace_ipt_enable;
+arg->domain = domid;
+arg->vcpu = vcpu;
+
+rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+  HYPERCALL_BUFFER_AS_ARG(arg));
+
+xc_hypercall_buffer_free(xch, arg);
+return rc;
+}
+
+int xc_vmtrace_ipt_get_offset(
+xc_interface *xch, uint32_t domid, uint32_t vcpu, uint64_t *offset)
+{
+DECLARE_HYPERCALL_BUFFER(xen_hvm_vmtrace_op_t, arg);
+int rc = -1;
+
+arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+if ( arg == NULL )
+return -1;
+
+arg->version = HVMOP_VMTRACE_INTERFACE_VERSION;
+arg->cmd = HVMOP_vmtrace_ipt_get_offset;
+arg->domain = domid;
+arg->vcpu = vcpu;
+
+rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op, HVMOP_vmtrace,
+  HYPERCALL_BUFFER_AS_ARG(arg));
+
+if ( rc == 0 )
+{
+*offset = 

[PATCH v2 6/7] tools/libxl: add vmtrace_pt_size parameter

2020-06-18 Thread Michał Leszczyński
Allow to specify the size of per-vCPU trace buffer upon
domain creation. This is zero by default (meaning: not enabled).

Signed-off-by: Michal Leszczynski 
---
 tools/golang/xenlight/helpers.gen.go | 2 ++
 tools/golang/xenlight/types.gen.go   | 1 +
 tools/libxl/libxl_types.idl  | 2 ++
 tools/libxl/libxl_x86.c  | 5 +
 tools/xl/xl_parse.c  | 4 
 5 files changed, 14 insertions(+)

diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 935d3bc50a..986ebbd681 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1117,6 +1117,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.Altp2M = Altp2MMode(xc.altp2m)
+x.VmtracePtSize = int(xc.vmtrace_pt_size)
 
  return nil}
 
@@ -1592,6 +1593,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.altp2m = C.libxl_altp2m_mode(x.Altp2M)
+xc.vmtrace_pt_size = C.int(x.VmtracePtSize)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 663c1e86b4..41ec7cdd32 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -516,6 +516,7 @@ GicVersion GicVersion
 Vuart VuartType
 }
 Altp2M Altp2MMode
+VmtracePtSize int
 }
 
 type domainBuildInfoTypeUnion interface {
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9d3f05f399..04c1704b72 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -645,6 +645,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
 # supported by x86 HVM and ARM support is planned.
 ("altp2m", libxl_altp2m_mode),
 
+("vmtrace_pt_size", integer),
+
 ], dir=DIR_IN,
copy_deprecated_fn="libxl__domain_build_info_copy_deprecated",
 )
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index e57f63282e..14be2b395a 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -404,6 +404,11 @@ static int hvm_set_conf_params(libxl__gc *gc, uint32_t 
domid,
 libxl_defbool_val(info->u.hvm.altp2m))
 altp2m = libxl_defbool_val(info->u.hvm.altp2m);
 
+if (xc_hvm_param_set(xch, domid, HVM_PARAM_VMTRACE_PT_SIZE,
+ info->vmtrace_pt_size)) {
+LOG(ERROR, "Couldn't set HVM_PARAM_VMTRACE_PT_SIZE");
+goto out;
+}
 if (xc_hvm_param_set(xch, domid, HVM_PARAM_HPET_ENABLED,
  libxl_defbool_val(info->u.hvm.hpet))) {
 LOG(ERROR, "Couldn't set HVM_PARAM_HPET_ENABLED");
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 61b4ef7b7e..6ab98dda55 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1861,6 +1861,10 @@ void parse_config_data(const char *config_source,
 }
 }
 
+if (!xlu_cfg_get_long(config, "vmtrace_pt_size", , 1)) {
+b_info->vmtrace_pt_size = l;
+}
+
 if (!xlu_cfg_get_list(config, "ioports", , _ioports, 0)) {
 b_info->num_ioports = num_ioports;
 b_info->ioports = calloc(num_ioports, sizeof(*b_info->ioports));
-- 
2.20.1




[PATCH v2 4/7] x86/vmx: add do_vmtrace_op

2020-06-18 Thread Michał Leszczyński
Provide an interface for privileged domains to manage
external IPT monitoring. Guest IPT state will be preserved
across vmentry/vmexit using ipt_state structure.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/hvm.c | 167 +
 xen/arch/x86/hvm/vmx/vmx.c |  24 +
 xen/arch/x86/mm.c  |  37 +++
 xen/common/domain.c|   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |  16 +++
 xen/include/public/hvm/hvm_op.h|  23 
 xen/include/public/hvm/params.h|   5 +-
 xen/include/public/memory.h|   1 +
 xen/include/xen/sched.h|   3 +
 9 files changed, 276 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5bb47583b3..145ad053d2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1612,6 +1612,24 @@ int hvm_vcpu_initialise(struct vcpu *v)
 return rc;
 }
 
+void hvm_vmtrace_destroy(struct vcpu *v)
+{
+unsigned int i;
+struct page_info *pg;
+struct ipt_state *ipt = v->arch.hvm.vmx.ipt_state;
+mfn_t buf_mfn = ipt->output_base >> PAGE_SHIFT;
+size_t buf_size = ipt->output_mask.size + 1;
+
+xfree(ipt);
+v->arch.hvm.vmx.ipt_state = NULL;
+
+for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
+{
+pg = mfn_to_page(_mfn(mfn_add(buf_mfn, i)));
+free_domheap_page(pg);
+}
+}
+
 void hvm_vcpu_destroy(struct vcpu *v)
 {
 viridian_vcpu_deinit(v);
@@ -1631,6 +1649,8 @@ void hvm_vcpu_destroy(struct vcpu *v)
 vlapic_destroy(v);
 
 hvm_vcpu_cacheattr_destroy(v);
+
+hvm_vmtrace_destroy(v);
 }
 
 void hvm_vcpu_down(struct vcpu *v)
@@ -4066,6 +4086,51 @@ static int hvmop_set_evtchn_upcall_vector(
 return 0;
 }
 
+static int hvm_set_vmtrace_pt_size(struct domain *d, uint64_t value)
+{
+void *buf;
+unsigned int buf_order;
+struct page_info *pg;
+struct ipt_state *ipt;
+struct vcpu *v;
+
+if ( value < PAGE_SIZE ||
+ value > GB(4) ||
+ ( value & (value - 1) ) ) {
+/* we don't accept trace buffer size smaller than single page
+ * and the upper bound is defined as 4GB in the specification */
+return -EINVAL;
+}
+
+for_each_vcpu ( d, v )
+{
+buf_order = get_order_from_bytes(value);
+pg = alloc_domheap_pages(d, buf_order, MEMF_no_refcount);
+
+if ( !pg )
+return -EFAULT;
+
+buf = page_to_virt(pg);
+
+if ( vmx_add_host_load_msr(v, MSR_RTIT_CTL, 0) )
+return -EFAULT;
+
+ipt = xmalloc(struct ipt_state);
+
+if ( !ipt )
+return -EFAULT;
+
+ipt->output_base = virt_to_mfn(buf) << PAGE_SHIFT;
+ipt->output_mask.raw = value - 1;
+ipt->status = 0;
+ipt->active = 0;
+
+v->arch.hvm.vmx.ipt_state = ipt;
+}
+
+return 0;
+}
+
 static int hvm_allow_set_param(struct domain *d,
uint32_t index,
uint64_t new_value)
@@ -4127,6 +4192,7 @@ static int hvm_allow_set_param(struct domain *d,
 case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
 case HVM_PARAM_ALTP2M:
 case HVM_PARAM_MCA_CAP:
+case HVM_PARAM_VMTRACE_PT_SIZE:
 if ( value != 0 && new_value != value )
 rc = -EEXIST;
 break;
@@ -4328,6 +4394,9 @@ static int hvm_set_param(struct domain *d, uint32_t 
index, uint64_t value)
 case HVM_PARAM_MCA_CAP:
 rc = vmce_enable_mca_cap(d, value);
 break;
+case HVM_PARAM_VMTRACE_PT_SIZE:
+rc = hvm_set_vmtrace_pt_size(d, value);
+break;
 }
 
 if ( !rc )
@@ -4949,6 +5018,100 @@ static int compat_altp2m_op(
 return rc;
 }
 
+static int do_vmtrace_op(XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+struct xen_hvm_vmtrace_op a;
+struct domain *d;
+int rc;
+struct vcpu *v;
+struct ipt_state *ipt;
+
+if ( !hvm_pt_supported() )
+return -EOPNOTSUPP;
+
+if ( copy_from_guest(, arg, 1) )
+return -EFAULT;
+
+if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
+return -EINVAL;
+
+d = rcu_lock_domain_by_any_id(a.domain);
+spin_lock(>vmtrace_lock);
+
+if ( d == NULL )
+return -ESRCH;
+
+if ( !is_hvm_domain(d) )
+{
+rc = -EOPNOTSUPP;
+goto out;
+}
+
+domain_pause(d);
+
+if ( a.vcpu >= d->max_vcpus )
+{
+rc = -EINVAL;
+goto out;
+}
+
+v = d->vcpu[a.vcpu];
+ipt = v->arch.hvm.vmx.ipt_state;
+
+if ( !ipt )
+{
+/*
+* PT must be first initialized upon domain creation.
+*/
+rc = -EINVAL;
+goto out;
+}
+
+switch ( a.cmd )
+{
+case HVMOP_vmtrace_ipt_enable:
+if ( vmx_add_guest_msr(v, MSR_RTIT_CTL,
+   RTIT_CTL_TRACEEN | RTIT_CTL_OS |
+   RTIT_CTL_USR | RTIT_CTL_BRANCH_EN) )
+{
+rc = -EFAULT;
+goto out;
+}

[PATCH v2 3/7] x86/vmx: add IPT cpu feature

2020-06-18 Thread Michał Leszczyński
Check if Intel Processor Trace feature is supported by current
processor. Define hvm_ipt_supported function.

Signed-off-by: Michal Leszczynski 
---
 xen/arch/x86/hvm/vmx/vmcs.c | 4 
 xen/include/asm-x86/cpufeature.h| 1 +
 xen/include/asm-x86/hvm/hvm.h   | 9 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  | 1 +
 xen/include/public/arch-x86/cpufeatureset.h | 1 +
 5 files changed, 16 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index ca94c2bedc..8466ccb912 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -315,6 +315,10 @@ static int vmx_init_vmcs_config(void)
 if ( opt_ept_pml )
 opt |= SECONDARY_EXEC_ENABLE_PML;
 
+/* Check whether IPT is supported in VMX operation */
+hvm_funcs.pt_supported = cpu_has_ipt &&
+( _vmx_misc_cap & VMX_MISC_PT_SUPPORTED );
+
 /*
  * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
  * can be set only when "use TPR shadow" is set
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index f790d5c1f8..8d7955dd87 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -104,6 +104,7 @@
 #define cpu_has_clwbboot_cpu_has(X86_FEATURE_CLWB)
 #define cpu_has_avx512erboot_cpu_has(X86_FEATURE_AVX512ER)
 #define cpu_has_avx512cdboot_cpu_has(X86_FEATURE_AVX512CD)
+#define cpu_has_ipt boot_cpu_has(X86_FEATURE_IPT)
 #define cpu_has_sha boot_cpu_has(X86_FEATURE_SHA)
 #define cpu_has_avx512bwboot_cpu_has(X86_FEATURE_AVX512BW)
 #define cpu_has_avx512vlboot_cpu_has(X86_FEATURE_AVX512VL)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1eb377dd82..8c0d0ece67 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -96,6 +96,9 @@ struct hvm_function_table {
 /* Necessary hardware support for alternate p2m's? */
 bool altp2m_supported;
 
+/* Hardware support for processor tracing? */
+bool pt_supported;
+
 /* Hardware virtual interrupt delivery enable? */
 bool virtual_intr_delivery_enabled;
 
@@ -630,6 +633,12 @@ static inline bool hvm_altp2m_supported(void)
 return hvm_funcs.altp2m_supported;
 }
 
+/* returns true if hardware supports Intel Processor Trace */
+static inline bool hvm_pt_supported(void)
+{
+return hvm_funcs.pt_supported;
+}
+
 /* updates the current hardware p2m */
 static inline void altp2m_vcpu_update_p2m(struct vcpu *v)
 {
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 906810592f..4c81093aba 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -285,6 +285,7 @@ extern u64 vmx_ept_vpid_cap;
 
 #define VMX_MISC_CR3_TARGET 0x01ff
 #define VMX_MISC_VMWRITE_ALL0x2000
+#define VMX_MISC_PT_SUPPORTED   0x4000
 
 #define VMX_TSC_MULTIPLIER_MAX  0xULL
 
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index 5ca35d9d97..0d3f15f628 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -217,6 +217,7 @@ XEN_CPUFEATURE(SMAP,  5*32+20) /*S  Supervisor Mode 
Access Prevention */
 XEN_CPUFEATURE(AVX512_IFMA,   5*32+21) /*A  AVX-512 Integer Fused Multiply Add 
*/
 XEN_CPUFEATURE(CLFLUSHOPT,5*32+23) /*A  CLFLUSHOPT instruction */
 XEN_CPUFEATURE(CLWB,  5*32+24) /*A  CLWB instruction */
+XEN_CPUFEATURE(IPT,   5*32+25) /*   Intel Processor Trace */
 XEN_CPUFEATURE(AVX512PF,  5*32+26) /*A  AVX-512 Prefetch Instructions */
 XEN_CPUFEATURE(AVX512ER,  5*32+27) /*A  AVX-512 Exponent & Reciprocal 
Instrs */
 XEN_CPUFEATURE(AVX512CD,  5*32+28) /*A  AVX-512 Conflict Detection Instrs 
*/
-- 
2.20.1




[PATCH v2 2/7] x86/vmx: add Intel PT MSR definitions

2020-06-18 Thread Michał Leszczyński
Define constants related to Intel Processor Trace features.

Signed-off-by: Michal Leszczynski 
---
 xen/include/asm-x86/msr-index.h | 37 +
 1 file changed, 37 insertions(+)

diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index b328a47ed8..812516f340 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -621,4 +621,41 @@
 #define MSR_PKGC9_IRTL 0x0634
 #define MSR_PKGC10_IRTL0x0635
 
+/* Intel PT MSRs */
+#define MSR_RTIT_OUTPUT_BASE   0x0560
+#define MSR_RTIT_OUTPUT_MASK   0x0561
+#define MSR_RTIT_CTL   0x0570
+#define RTIT_CTL_TRACEEN   (_AC(1, ULL) << 0)
+#define RTIT_CTL_CYCEN (_AC(1, ULL) << 1)
+#define RTIT_CTL_OS(_AC(1, ULL) << 2)
+#define RTIT_CTL_USR   (_AC(1, ULL) << 3)
+#define RTIT_CTL_PWR_EVT_EN(_AC(1, ULL) << 4)
+#define RTIT_CTL_FUP_ON_PTW(_AC(1, ULL) << 5)
+#define RTIT_CTL_FABRIC_EN (_AC(1, ULL) << 6)
+#define RTIT_CTL_CR3_FILTER(_AC(1, ULL) << 7)
+#define RTIT_CTL_TOPA  (_AC(1, ULL) << 8)
+#define RTIT_CTL_MTC_EN(_AC(1, ULL) << 9)
+#define RTIT_CTL_TSC_EN(_AC(1, ULL) << 10)
+#define RTIT_CTL_DIS_RETC  (_AC(1, ULL) << 11)
+#define RTIT_CTL_PTW_EN(_AC(1, ULL) << 12)
+#define RTIT_CTL_BRANCH_EN (_AC(1, ULL) << 13)
+#define RTIT_CTL_MTC_FREQ_OFFSET   14
+#define RTIT_CTL_MTC_FREQ  (0x0fULL << RTIT_CTL_MTC_FREQ_OFFSET)
+#define RTIT_CTL_CYC_THRESH_OFFSET 19
+#define RTIT_CTL_CYC_THRESH(0x0fULL << RTIT_CTL_CYC_THRESH_OFFSET)
+#define RTIT_CTL_PSB_FREQ_OFFSET   24
+#define RTIT_CTL_PSB_FREQ  (0x0fULL << RTIT_CTL_PSB_FREQ_OFFSET)
+#define RTIT_CTL_ADDR_OFFSET(n)(32 + 4 * (n))
+#define RTIT_CTL_ADDR(n)   (0x0fULL << RTIT_CTL_ADDR_OFFSET(n))
+#define MSR_RTIT_STATUS0x0571
+#define RTIT_STATUS_FILTER_EN  (_AC(1, ULL) << 0)
+#define RTIT_STATUS_CONTEXT_EN (_AC(1, ULL) << 1)
+#define RTIT_STATUS_TRIGGER_EN (_AC(1, ULL) << 2)
+#define RTIT_STATUS_ERROR  (_AC(1, ULL) << 4)
+#define RTIT_STATUS_STOPPED(_AC(1, ULL) << 5)
+#define RTIT_STATUS_BYTECNT(0x1ULL << 32)
+#define MSR_RTIT_CR3_MATCH 0x0572
+#define MSR_RTIT_ADDR_A(n) (0x0580 + (n) * 2)
+#define MSR_RTIT_ADDR_B(n) (0x0581 + (n) * 2)
+
 #endif /* __ASM_MSR_INDEX_H */
-- 
2.20.1




[PATCH v2 1/7] xen/mm: lift 32 item limit from mfn/gfn arrays

2020-06-18 Thread Michał Leszczyński
Replace on-stack array allocation with heap allocation
in order to lift the limit of 32 items in mfn/gfn arrays
when calling acquire_resource.

Signed-off-by: Michal Leszczynski 
---
 xen/common/memory.c | 39 +--
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..e02606ebe5 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1050,12 +1050,7 @@ static int acquire_resource(
 {
 struct domain *d, *currd = current->domain;
 xen_mem_acquire_resource_t xmar;
-/*
- * The mfn_list and gfn_list (below) arrays are ok on stack for the
- * moment since they are small, but if they need to grow in future
- * use-cases then per-CPU arrays or heap allocations may be required.
- */
-xen_pfn_t mfn_list[32];
+xen_pfn_t *mfn_list;
 int rc;
 
 if ( copy_from_guest(, arg, 1) )
@@ -1064,25 +1059,17 @@ static int acquire_resource(
 if ( xmar.pad != 0 )
 return -EINVAL;
 
-if ( guest_handle_is_null(xmar.frame_list) )
-{
-if ( xmar.nr_frames )
-return -EINVAL;
-
-xmar.nr_frames = ARRAY_SIZE(mfn_list);
-
-if ( __copy_field_to_guest(arg, , nr_frames) )
-return -EFAULT;
-
-return 0;
-}
+mfn_list = xmalloc_array(xen_pfn_t, xmar.nr_frames);
 
-if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
-return -E2BIG;
+if ( ! mfn_list )
+return -EFAULT;
 
 rc = rcu_lock_remote_domain_by_id(xmar.domid, );
 if ( rc )
+{
+xfree(mfn_list);
 return rc;
+}
 
 rc = xsm_domain_resource_map(XSM_DM_PRIV, d);
 if ( rc )
@@ -,7 +1098,7 @@ static int acquire_resource(
 }
 else
 {
-xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
+xen_pfn_t *gfn_list;
 unsigned int i;
 
 /*
@@ -1120,7 +1107,12 @@ static int acquire_resource(
  *resource pages unless the caller is the hardware domain.
  */
 if ( !is_hardware_domain(currd) )
-return -EACCES;
+{
+rc = -EACCES;
+goto out;
+}
+
+gfn_list = xmalloc_array(xen_pfn_t, xmar.nr_frames);
 
 if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
 rc = -EFAULT;
@@ -1133,9 +1125,12 @@ static int acquire_resource(
 if ( rc && i )
 rc = -EIO;
 }
+
+xfree(gfn_list);
 }
 
  out:
+xfree(mfn_list);
 rcu_unlock_domain(d);
 
 return rc;
-- 
2.20.1



[PATCH v2 0/7] Implement support for external IPT monitoring

2020-06-18 Thread Michał Leszczyński
Intel Processor Trace is an architectural extension available in modern Intel 
family CPUs. It allows recording the detailed trace of activity while the 
processor executes the code. One might use the recorded trace to reconstruct 
the code flow. It means, to find out the executed code paths, determine 
branches taken, and so forth.

The abovementioned feature is described in Intel(R) 64 and IA-32 Architectures 
Software Developer's Manual Volume 3C: System Programming Guide, Part 3, 
Chapter 36: "Intel Processor Trace."

This patch series implements an interface that Dom0 could use in order to 
enable IPT for particular vCPUs in DomU, allowing for external monitoring. Such 
a feature has numerous applications like malware monitoring, fuzzing, or 
performance testing.

Also thanks to Tamas K Lengyel for a few preliminary hints before
first version of this patch was submitted to xen-devel.

Changed since v1:
  * MSR_RTIT_CTL is managed using MSR load lists
  * other PT-related MSRs are modified only when vCPU goes out of context
  * trace buffer is now acquired as a resource
  * added vmtrace_pt_size parameter in xl.cfg, the size of trace buffer
must be specified in the moment of domain creation
  * trace buffers are allocated on domain creation, destructed on
domain destruction
  * HVMOP_vmtrace_ipt_enable/disable is limited to enabling/disabling PT
these calls don't manage buffer memory anymore
  * lifted 32 MFN/GFN array limit when acquiring resources
  * minor code style changes according to review

Michal Leszczynski (7):
  xen/mm: lift 32 item limit from mfn/gfn arrays
  x86/vmx: add Intel PT MSR definitions
  x86/vmx: add IPT cpu feature
  x86/vmx: add do_vmtrace_op
  tools/libxc: add xc_vmtrace_* functions
  tools/libxl: add vmtrace_pt_size parameter
  tools/proctrace: add proctrace tool

 tools/golang/xenlight/helpers.gen.go|   2 +
 tools/golang/xenlight/types.gen.go  |   1 +
 tools/libxc/Makefile|   1 +
 tools/libxc/include/xenctrl.h   |  39 +++
 tools/libxc/xc_vmtrace.c|  97 ++
 tools/libxl/libxl_types.idl |   2 +
 tools/libxl/libxl_x86.c |   5 +
 tools/proctrace/COPYING | 339 
 tools/proctrace/Makefile|  50 +++
 tools/proctrace/proctrace.c | 153 +
 tools/xl/xl_parse.c |   4 +
 xen/arch/x86/hvm/hvm.c  | 167 ++
 xen/arch/x86/hvm/vmx/vmcs.c |   4 +
 xen/arch/x86/hvm/vmx/vmx.c  |  24 ++
 xen/arch/x86/mm.c   |  37 +++
 xen/common/domain.c |   1 +
 xen/common/memory.c |  39 +--
 xen/include/asm-x86/cpufeature.h|   1 +
 xen/include/asm-x86/hvm/hvm.h   |   9 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |  17 +
 xen/include/asm-x86/msr-index.h |  37 +++
 xen/include/public/arch-x86/cpufeatureset.h |   1 +
 xen/include/public/hvm/hvm_op.h |  23 ++
 xen/include/public/hvm/params.h |   5 +-
 xen/include/public/memory.h |   1 +
 xen/include/xen/sched.h |   3 +
 26 files changed, 1039 insertions(+), 23 deletions(-)
 create mode 100644 tools/libxc/xc_vmtrace.c
 create mode 100644 tools/proctrace/COPYING
 create mode 100644 tools/proctrace/Makefile
 create mode 100644 tools/proctrace/proctrace.c

-- 
2.20.1



Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op

2020-06-18 Thread Michał Leszczyński
- 16 cze 2020 o 19:23, Roger Pau Monné roger@citrix.com napisał(a):

> On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
>> Provide an interface for privileged domains to manage
>> external IPT monitoring.
>> 
>> Signed-off-by: Michal Leszczynski 
> 
> Thanks for the patch! I have some questions below which require your
> input.
> 
>> ---
>>  xen/arch/x86/hvm/hvm.c  | 170 
>>  xen/include/public/hvm/hvm_op.h |  27 +
>>  2 files changed, 197 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 5bb47583b3..9292caebe0 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4949,6 +4949,172 @@ static int compat_altp2m_op(
>>  return rc;
>>  }
>>  
>> +static int do_vmtrace_op(
>> +XEN_GUEST_HANDLE_PARAM(void) arg)
> 
> No need for the newline, this can fit on a single line.
> 
>> +{
>> +struct xen_hvm_vmtrace_op a;
>> +struct domain *d = NULL;
> 
> I don't think you need to init d to NULL (at least by looking at the
> current code below).
> 
>> +int rc = -EFAULT;
> 
> No need to init rc.
> 
>> +int i;
> 
> unsigned since it's used as a loop counter.
> 
>> +struct vcpu *v;
>> +void* buf;
> 
> Nit: '*' should be prepended to the variable name.
> 
>> +uint32_t buf_size;
> 
> size_t
> 
>> +uint32_t buf_order;
> 
> Order is generally fine using unsigned int, no need to use a
> specifically sized type.
> 
>> +uint64_t buf_mfn;
> 
> Could this use the mfn type?
> 
>> +struct page_info *pg;
>> +
>> +if ( !hvm_ipt_supported() )
>> +return -EOPNOTSUPP;
>> +
>> +if ( copy_from_guest(, arg, 1) )
>> +return -EFAULT;
>> +
>> +if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
>> +return -EINVAL;
>> +
>> +switch ( a.cmd )
>> +{
>> +case HVMOP_vmtrace_ipt_enable:
>> +case HVMOP_vmtrace_ipt_disable:
>> +case HVMOP_vmtrace_ipt_get_buf:
>> +case HVMOP_vmtrace_ipt_get_offset:
>> +break;
>> +
>> +default:
>> +return -EOPNOTSUPP;
>> +}
>> +
>> +d = rcu_lock_domain_by_any_id(a.domain);
>> +
>> +if ( d == NULL )
>> +return -ESRCH;
>> +
>> +if ( !is_hvm_domain(d) )
>> +{
>> +rc = -EOPNOTSUPP;
>> +goto out;
>> +}
>> +
>> +domain_pause(d);
>> +
>> +if ( a.vcpu >= d->max_vcpus )
>> +{
>> +rc = -EINVAL;
>> +goto out;
>> +}
>> +
>> +v = d->vcpu[a.vcpu];
>> +
>> +if ( a.cmd == HVMOP_vmtrace_ipt_enable )
> 
> Please use a switch here, you might even consider re-using the switch
> from above and moving the domain checks before actually checking the
> command field, so that you don't need to perform two switches against
> a.cmd.
> 
>> +{
>> +if ( v->arch.hvm.vmx.ipt_state ) {
> 
> Coding style, brace should be on newline (there are more below which
> I'm not going to comment on).
> 
>> +// already enabled
> 
> Comments should use /* ... */, there multiple instances of this below
> which I'm not going to comment on, please check CODING_STYLE.
> 
> Also, the interface looks racy, I think you are missing a lock to
> protect v->arch.hvm.vmx.ipt_state from being freed under your feet if
> you issue concurrent calls to the interface.
> 
>> +rc = -EINVAL;
>> +goto out;
>> +}
>> +
>> +if ( a.size < PAGE_SIZE || a.size > 100 * PAGE_SIZE ) {
> 
> You can use GB(4) which is easier to read. Should the size also be a
> multiple of a PAGE_SIZE?
> 
>> +// we don't accept trace buffer size smaller than single page
>> +// and the upper bound is defined as 4GB in the specification
>> +rc = -EINVAL;
>> +goto out;
>> +}
> 
> Stray tab.
> 
>> +
>> +buf_order = get_order_from_bytes(a.size);
>> +
>> +if ( (a.size >> PAGE_SHIFT) != (1 << buf_order) ) {
> 
> Oh here is the check. I think you can move this with the checks above
> by doing a.size & ~PAGE_MASK.
> 
>> +rc = -EINVAL;
>> +goto out;
>> +}
>> +
>> +buf = page_to_virt(alloc_domheap_pages(d, buf_order,

Re: [PATCH v1 0/7] Implement support for external IPT monitoring

2020-06-18 Thread Michał Leszczyński
- 17 cze 2020 o 18:19, Andrew Cooper andrew.coop...@citrix.com napisał(a):

> On 17/06/2020 04:02, Tamas K Lengyel wrote:
>> On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper  
>> wrote:
>>> On 16/06/2020 19:47, Michał Leszczyński wrote:
>>>> - 16 cze 2020 o 20:17, Andrew Cooper andrew.coop...@citrix.com 
>>>> napisał(a):
>>>>
>>>>> Are there any restrictions on EPT being enabled in the first place?  I'm
>>>>> not aware of any, and in principle we could use this functionality for
>>>>> PV guests as well (using the CPL filter).  Therefore, I think it would
>>>>> be helpful to not tie the functionality to HVM guests, even if that is
>>>>> the only option enabled to start with.
>>>> I think at the moment it's not required to have EPT. This patch series 
>>>> doesn't
>>>> use any translation feature flags, so the output address is always a 
>>>> machine
>>>> physical address, regardless of context. I will check if it could be easily
>>>> used with PV.
>>> If its trivial to add PV support then please do.  If its not, then don't
>>> feel obliged, but please do at least consider how PV support might look
>>> in the eventual feature.
>>>
>>> (Generally speaking, considering "how would I make this work in other
>>> modes where it is possible" leads to a better design.)
>>>
>>>>> The buffer mapping and creation logic is fairly problematic.  Instead of
>>>>> fighting with another opencoded example, take a look at the IOREQ
>>>>> server's use of "acquire resource" which is a mapping interface which
>>>>> supports allocating memory on behalf of the guest, outside of the guest
>>>>> memory, for use by control tools.
>>>>>
>>>>> I think what this wants is a bit somewhere in domain_create to indicate
>>>>> that external tracing is used for this domain (and allocate whatever
>>>>> structures/buffers are necessary), acquire resource to map the buffers
>>>>> themselves, and a domctl for any necessary runtime controls.
>>>>>
>>>> I will check this out, this sounds like a good option as it would remove 
>>>> lots of
>>>> complexity from the existing ipt_enable domctl.
>>> Xen has traditionally opted for a "and turn this extra thing on
>>> dynamically" model, but this has caused no end of security issues and
>>> broken corner cases.
>>>
>>> You can see this still existing in the difference between
>>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>>> required to chose the number of vcpus for the domain) and we're making
>>> good progress undoing this particular wart (before 4.13, it was
>>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>>> issuing other hypercalls between these two).
>>>
>>> There is a lot of settings which should be immutable for the lifetime of
>>> the domain, and external monitoring looks like another one of these.
>>> Specifying it at createdomain time allows for far better runtime
>>> behaviour (you are no longer in a situation where the first time you try
>>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>>> the meantime and used the remaining memory), and it makes for rather
>>> more simple code in Xen itself (at runtime, you can rely on it having
>>> been set up properly, because a failure setting up will have killed the
>>> domain already).
>> I'm not in favor of this being a flag that gets set during domain
>> creation time. It could certainly be the case that some users would
>> want this being on from the start till the end but in other cases you
>> may want to enable it intermittently only for some time in-between
>> particular events. If it's an on/off flag during domain creation you
>> pretty much force that choice on the users and while the overhead of
>> PT is better than say MTF it's certainly not nothing. In case there is
>> an OOM situation enabling IPT dynamically the user can always just
>> pause the VM and wait till memory becomes available.
> 
> There is nothing wrong with having "turn tracing on/off at runtime"
> hypercalls.  It is specifically what I suggested two posts up in this
> thread, but it should be limited to the TraceEn bit in RTIT_CTL.
> 
> What isn't ok is trying to allocate the buffers, write the TOPA, etc on
> first-enable or first-map, beca

Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op

2020-06-18 Thread Michał Leszczyński
- 18 cze 2020 o 14:51, Jan Beulich jbeul...@suse.com napisał(a):

> On 18.06.2020 13:55, Roger Pau Monné wrote:
>> On Thu, Jun 18, 2020 at 01:01:39PM +0200, Michał Leszczyński wrote:
>>> It was previously stated that:
>>>
>>>> PVH or HVM domain
>>>> won't be able to use this interface since it has no way to request the
>>>> mapping of a specific mfn into it's physmap.
>>>
>>> but however, taking LibVMI as an example:
>>>
>>> https://github.com/libvmi/libvmi/blob/c461e20ae88bc62c08c27f50fcead23c27a30f9e/libvmi/driver/xen/xen.c#L51
>>>
>>> An essential abstraction xen_get_memory() relies on xc_map_foreign_range().
>>> Doesn't this mean that it's not usable from PVH or HVM domains, or did I 
>>> got it
>>> all wrong?
>> 
>> That was my fault, so the buffer mfns are assigned to Xen, and then
>> the Xen domain ID is used to map those, which should work on both PV
>> and HVM (or PVH).
>> 
>> I still think using XENMEM_acquire_resource might be better, but I
>> would let others comment.
> 
> +1
> 
> Jan


I'm trying to implement this right now. I've added some very simple code to 
mm.c just for testing:

---

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e376fc7e8f..aaaefe6d23 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4624,6 +4624,26 @@ int arch_acquire_resource(struct domain *d, unsigned int 
type,
 }
 break;
 }
+
+case XENMEM_resource_vmtrace_buf:
+{
+uint64_t output_base;
+mfn_t mfn;
+unsigned int i;
+
+printk("vmtrace buf acquire\n");
+output_base = d->vcpu[id]->arch.hvm.vmx.ipt_state->output_base;
+mfn = mfn_x(output_base >> PAGE_SHIFT);
+
+rc = 0;
+for ( i = 0; i < nr_frames; i++ )
+{
+__map_domain_page_global(mfn_to_page(mfn + i));
+mfn_list[i] = mfn + i;
+}
+
+break;
+}
 #endif

 default:

---


and then in my "proctrace" tool I'm trying to acquire it like this:

fres = xenforeignmemory_map_resource(
fmem, domid, XENMEM_resource_vmtrace_buf,
/* vcpu: */ 0, /* frame: */ 0, /* num_frames: */ 128, (void **),
PROT_READ, 0);


ioctl fails with "Argument list too long". It works fine when I provide some 
small number of frames (e.g. num_frames: 1 or 32), but doesn't work for any 
larger quantity.

How should I proceed with this? The PT buffer could be large, even up to 4 GB.


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit

2020-06-18 Thread Michał Leszczyński
- 18 cze 2020 o 10:52, Roger Pau Monné roger@citrix.com napisał(a):

> On Wed, Jun 17, 2020 at 08:56:57PM +0200, Michał Leszczyński wrote:
>> - 17 cze 2020 o 17:14, Andrew Cooper andrew.coop...@citrix.com 
>> napisał(a):
>> 
>> > On 17/06/2020 13:51, Roger Pau Monné wrote:
>> >> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>> >>> - 17 cze 2020 o 11:09, Roger Pau Monné roger@citrix.com 
>> >>> napisał(a):
>> >>>
>> >>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields 
>> >>>> -> 24.8.1
>> >>>> VM-Entry Controls
>> >>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to 
>> >>>> determine
>> >>>> how it should set the reserved bits.
>> >>> Please look at bit position 18 "Load IA32_RTIT_CTL".
>> >> I think this is something different from what I was referring to.
>> >> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
>> >> loading/storing a specific field on the vmcs that maps to the guest
>> >> IA32_RTIT_CTL.
>> >>
>> >> OTOH MSR load lists can be used to load and store any arbitrary MSR on
>> >> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
>> >> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
>> > 
>> > If I remember the historic roadmaps correctly, there are 3 cases.
>> > 
>> > The first hardware to support PT (Broadwell?) prohibited its use
>> > completely in VMX operations.  In this case, we can use it to trace PV
>> > guests iff we don't enable VMX in hardware to begin with.
>> > 
>> > This was relaxed in later hardware (Skylake?) to permit use within VMX
>> > operations, but without any help in the VMCS.  (i.e. manual context
>> > switching per this patch, or MSR load lists as noted in the SDM.)
>> > 
>> > Subsequent support for "virtualised PT" was added (IceLake?) which adds
>> > the load/save controls, and the ability to translate the output buffer
>> > under EPT.
>> > 
>> > 
>> > All of this is from memory so I'm quite possibly wrong with details, but
>> > I believe this is why the current complexity exists.
>> > 
>> > ~Andrew
>> 
>> 
>> I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:
>> 
>> > 35.5.2.2 Guest-Only Tracing
>> > "For this usage, VM-entry is programmed to enable trace packet generation, 
>> > while
>> > VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to disable
>> > trace-packet generation in the host."
>> 
>> it actually helped a bit. With patch v1 there were parts of hypervisor 
>> recorded
>> in the trace (i.e. the moment between TRACE_EN being set and actual vmenter,
>> and the moment between vmexit and TRACE_EN being unset). Using MSR load list
>> this was eliminated. This change will be reflected in patch v2.
>> 
>> 
>> I can't however implement any working scenario in which all these MSRs are
>> managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets 
>> are
>> buffered internally and are flushed only when TRACE_EN bit in 
>> MSR_IA32_RTIT_CTL
>> is set to 0. The values of remaining registers will be stable after 
>> everything
>> is serialized. I think this is too complex for the load lists alone. I belive
>> that currently SDM instructs to use load lists only for toggling this single
>> bit on-or-off.
> 
> I think that's exactly what we want: handling TraceEn at
> vmentry/vmexit, so that no hypervisor packets are recorded. The rest
> of the MSRs can be handled in VMM mode without issues. Switching those
> on every vmentry/vmexit would also add more overhead that needed,
> since I assume they don't need to be modified on every entry/exit?


Assuming that there is a single DomU per pcpu and they are never migrated 
between pcpus then you never need to modify the remaining MSRs.

In case DomUs are floating or there are multiple DomUs per pcpu, we need to 
read out a few MSRs on vm-exit and restore them on vm-entry. Right now I'm 
always using this approach as I'm pretty not sure how to optimize it without 
introducing additional bugs. I will show the implementation in patch v2.


> 
>> 
>> Thus, for now I propose to stay with MSR_IA32_RTIT_CTL being managed by MSR 
>> load
>> lists and the rest of related MSRs being managed manually.
> 
> Yes, that' seems like a good approach.
> 
> Roger.



Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op

2020-06-18 Thread Michał Leszczyński
- 18 cze 2020 o 5:20, Tamas K Lengyel tamas.k.leng...@gmail.com napisał(a):

>> >> +
>> >> +a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
>> >
>> > This will not work for translated domains, ie: a PVH or HVM domain
>> > won't be able to use this interface since it has no way to request the
>> > mapping of a specific mfn into it's physmap. I think we need to take
>> > this into account when deciding how the interface should be, so that
>> > we don't corner ourselves with a PV only interface.
>>
>> Please be aware that this is only going to be used by Dom0. Is is 
>> well-supported
>> case that somebody is using PVH/HVM Dom0?
>>
>> I think that all Virtual Machine Introspection stuff currently requires to 
>> have
>> Dom0 PV. Our main goal is to have this working well in combo with VMI.
> 
> FYI the VMI interface doesn't require a PV domain. It works fine from
> PVH dom0 or even from a secondary privileged HVM DomU as well,
> provided you have the right XSM policy to allow that.
> 
> Tamas


It was previously stated that:

> PVH or HVM domain
> won't be able to use this interface since it has no way to request the
> mapping of a specific mfn into it's physmap.

but however, taking LibVMI as an example:

https://github.com/libvmi/libvmi/blob/c461e20ae88bc62c08c27f50fcead23c27a30f9e/libvmi/driver/xen/xen.c#L51

An essential abstraction xen_get_memory() relies on xc_map_foreign_range(). 
Doesn't this mean that it's not usable from PVH or HVM domains, or did I got it 
all wrong?


Best regards,
Michał Leszczyński
CERT Polska



Re: [PATCH v1 0/7] Implement support for external IPT monitoring

2020-06-17 Thread Michał Leszczyński
- 18 cze 2020 o 1:29, Kang, Luwei luwei.k...@intel.com napisał(a):

>> > > How does KVM deal with this, do they insert/modify trace packets on
>> > > trapped and emulated instructions by the VMM?
>> >
>> > The KVM includes instruction decoder and
>> emulator(arch/x86/kvm/emulate.c), and the guest's memory can be set to
>> write-protect as well. But it doesn't support Intel PT packets software
>> emulator.
>> For KVM, the Intel PT feature will be exposed to KVM guest and KVM guest can
>> use Intel PT feature like native.
>> 
>> But if such feature is exposed to the guest for it's own usage, won't it be
>> missing packets for instructions emulated by the VMM?
> 
> If setting the guest's memory write-protect, I think yes.


Thus, I propose to leave it as it is right now. If somebody is purposely 
altering the VM state then he/she should consult not only the IPT but also 
understand what was done "in the meantime" by additional features, e.g. when 
something was altered by vm_event callback. As Tamas said previously, we 
usually just want to see certain path leading to vmexit.

Please also note that there is a PTWRITE instruction that could be used in the 
future in order to add custom payloads/hints to the PT trace, when needed.


> 
> Thanks,
> Luwei Kang
> 
>> 
> > Thanks, Roger.



Re: [PATCH v1 0/7] Implement support for external IPT monitoring

2020-06-17 Thread Michał Leszczyński
- 17 cze 2020 o 18:19, Andrew Cooper andrew.coop...@citrix.com napisał(a):

> On 17/06/2020 04:02, Tamas K Lengyel wrote:
>> On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper  
>> wrote:
>>> On 16/06/2020 19:47, Michał Leszczyński wrote:
>>>> - 16 cze 2020 o 20:17, Andrew Cooper andrew.coop...@citrix.com 
>>>> napisał(a):
>>>>
>>>>> Are there any restrictions on EPT being enabled in the first place?  I'm
>>>>> not aware of any, and in principle we could use this functionality for
>>>>> PV guests as well (using the CPL filter).  Therefore, I think it would
>>>>> be helpful to not tie the functionality to HVM guests, even if that is
>>>>> the only option enabled to start with.
>>>> I think at the moment it's not required to have EPT. This patch series 
>>>> doesn't
>>>> use any translation feature flags, so the output address is always a 
>>>> machine
>>>> physical address, regardless of context. I will check if it could be easily
>>>> used with PV.
>>> If its trivial to add PV support then please do.  If its not, then don't
>>> feel obliged, but please do at least consider how PV support might look
>>> in the eventual feature.
>>>
>>> (Generally speaking, considering "how would I make this work in other
>>> modes where it is possible" leads to a better design.)
>>>
>>>>> The buffer mapping and creation logic is fairly problematic.  Instead of
>>>>> fighting with another opencoded example, take a look at the IOREQ
>>>>> server's use of "acquire resource" which is a mapping interface which
>>>>> supports allocating memory on behalf of the guest, outside of the guest
>>>>> memory, for use by control tools.
>>>>>


One thing that remains unclear to me is the "acquire resource" part. Could you 
give some more details on that?

Assuming that buffers are allocated right from the domain creation, what 
mechanism (instead of xc_map_foreign_range) should I use to map the IPT buffers 
into Dom0?


>>>>> I think what this wants is a bit somewhere in domain_create to indicate
>>>>> that external tracing is used for this domain (and allocate whatever
>>>>> structures/buffers are necessary), acquire resource to map the buffers
>>>>> themselves, and a domctl for any necessary runtime controls.
>>>>>
>>>> I will check this out, this sounds like a good option as it would remove 
>>>> lots of
>>>> complexity from the existing ipt_enable domctl.
>>> Xen has traditionally opted for a "and turn this extra thing on
>>> dynamically" model, but this has caused no end of security issues and
>>> broken corner cases.
>>>
>>> You can see this still existing in the difference between
>>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>>> required to chose the number of vcpus for the domain) and we're making
>>> good progress undoing this particular wart (before 4.13, it was
>>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>>> issuing other hypercalls between these two).
>>>
>>> There is a lot of settings which should be immutable for the lifetime of
>>> the domain, and external monitoring looks like another one of these.
>>> Specifying it at createdomain time allows for far better runtime
>>> behaviour (you are no longer in a situation where the first time you try
>>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>>> the meantime and used the remaining memory), and it makes for rather
>>> more simple code in Xen itself (at runtime, you can rely on it having
>>> been set up properly, because a failure setting up will have killed the
>>> domain already).
>> I'm not in favor of this being a flag that gets set during domain
>> creation time. It could certainly be the case that some users would
>> want this being on from the start till the end but in other cases you
>> may want to enable it intermittently only for some time in-between
>> particular events. If it's an on/off flag during domain creation you
>> pretty much force that choice on the users and while the overhead of
>> PT is better than say MTF it's certainly not nothing. In case there is
>> an OOM situation enabling IPT dynamically the user can always just
>> pause the VM and wait till memory becomes available.
> 
> There is nothing wrong with having "

Re: [PATCH v1 0/7] Implement support for external IPT monitoring

2020-06-17 Thread Michał Leszczyński
- 17 cze 2020 o 18:27, Tamas K Lengyel tamas.k.leng...@gmail.com napisał(a):

> On Wed, Jun 17, 2020 at 10:19 AM Andrew Cooper
>  wrote:
>>
>> On 17/06/2020 04:02, Tamas K Lengyel wrote:
>> > On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper  
>> > wrote:
>> >> On 16/06/2020 19:47, Michał Leszczyński wrote:
>> >>> - 16 cze 2020 o 20:17, Andrew Cooper andrew.coop...@citrix.com 
>> >>> napisał(a):
>> >>>
>> >>>> Are there any restrictions on EPT being enabled in the first place?  I'm
>> >>>> not aware of any, and in principle we could use this functionality for
>> >>>> PV guests as well (using the CPL filter).  Therefore, I think it would
>> >>>> be helpful to not tie the functionality to HVM guests, even if that is
>> >>>> the only option enabled to start with.
>> >>> I think at the moment it's not required to have EPT. This patch series 
>> >>> doesn't
>> >>> use any translation feature flags, so the output address is always a 
>> >>> machine
>> >>> physical address, regardless of context. I will check if it could be 
>> >>> easily
>> >>> used with PV.
>> >> If its trivial to add PV support then please do.  If its not, then don't
>> >> feel obliged, but please do at least consider how PV support might look
>> >> in the eventual feature.
>> >>
>> >> (Generally speaking, considering "how would I make this work in other
>> >> modes where it is possible" leads to a better design.)
>> >>
>> >>>> The buffer mapping and creation logic is fairly problematic.  Instead of
>> >>>> fighting with another opencoded example, take a look at the IOREQ
>> >>>> server's use of "acquire resource" which is a mapping interface which
>> >>>> supports allocating memory on behalf of the guest, outside of the guest
>> >>>> memory, for use by control tools.
>> >>>>
>> >>>> I think what this wants is a bit somewhere in domain_create to indicate
>> >>>> that external tracing is used for this domain (and allocate whatever
>> >>>> structures/buffers are necessary), acquire resource to map the buffers
>> >>>> themselves, and a domctl for any necessary runtime controls.
>> >>>>
>> >>> I will check this out, this sounds like a good option as it would remove 
>> >>> lots of
>> >>> complexity from the existing ipt_enable domctl.
>> >> Xen has traditionally opted for a "and turn this extra thing on
>> >> dynamically" model, but this has caused no end of security issues and
>> >> broken corner cases.
>> >>
>> >> You can see this still existing in the difference between
>> >> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>> >> required to chose the number of vcpus for the domain) and we're making
>> >> good progress undoing this particular wart (before 4.13, it was
>> >> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>> >> issuing other hypercalls between these two).
>> >>
>> >> There is a lot of settings which should be immutable for the lifetime of
>> >> the domain, and external monitoring looks like another one of these.
>> >> Specifying it at createdomain time allows for far better runtime
>> >> behaviour (you are no longer in a situation where the first time you try
>> >> to turn tracing on, you end up with -ENOMEM because another VM booted in
>> >> the meantime and used the remaining memory), and it makes for rather
>> >> more simple code in Xen itself (at runtime, you can rely on it having
>> >> been set up properly, because a failure setting up will have killed the
>> >> domain already).
>> > I'm not in favor of this being a flag that gets set during domain
>> > creation time. It could certainly be the case that some users would
>> > want this being on from the start till the end but in other cases you
>> > may want to enable it intermittently only for some time in-between
>> > particular events. If it's an on/off flag during domain creation you
>> > pretty much force that choice on the users and while the overhead of
>> > PT is better than say MTF it's certainly not nothing. In case there is
>> > an OOM situation enabling IPT dynamically th

Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op

2020-06-17 Thread Michał Leszczyński
- 16 cze 2020 o 19:23, Roger Pau Monné roger@citrix.com napisał(a):

> On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
>> Provide an interface for privileged domains to manage
>> external IPT monitoring.
>> 
>> Signed-off-by: Michal Leszczynski 
> 
> Thanks for the patch! I have some questions below which require your
> input.
> 
>> ---
>>  xen/arch/x86/hvm/hvm.c  | 170 
>>  xen/include/public/hvm/hvm_op.h |  27 +
>>  2 files changed, 197 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 5bb47583b3..9292caebe0 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4949,6 +4949,172 @@ static int compat_altp2m_op(
>>  return rc;
>>  }
>>  
>> +static int do_vmtrace_op(
>> +XEN_GUEST_HANDLE_PARAM(void) arg)
> 
> No need for the newline, this can fit on a single line.
> 
>> +{
>> +struct xen_hvm_vmtrace_op a;
>> +struct domain *d = NULL;
> 
> I don't think you need to init d to NULL (at least by looking at the
> current code below).
> 
>> +int rc = -EFAULT;
> 
> No need to init rc.
> 
>> +int i;
> 
> unsigned since it's used as a loop counter.
> 
>> +struct vcpu *v;
>> +void* buf;
> 
> Nit: '*' should be prepended to the variable name.
> 
>> +uint32_t buf_size;
> 
> size_t
> 
>> +uint32_t buf_order;
> 
> Order is generally fine using unsigned int, no need to use a
> specifically sized type.
> 
>> +uint64_t buf_mfn;
> 
> Could this use the mfn type?
> 
>> +struct page_info *pg;
>> +
>> +if ( !hvm_ipt_supported() )
>> +return -EOPNOTSUPP;
>> +
>> +if ( copy_from_guest(, arg, 1) )
>> +return -EFAULT;
>> +
>> +if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
>> +return -EINVAL;
>> +
>> +switch ( a.cmd )
>> +{
>> +case HVMOP_vmtrace_ipt_enable:
>> +case HVMOP_vmtrace_ipt_disable:
>> +case HVMOP_vmtrace_ipt_get_buf:
>> +case HVMOP_vmtrace_ipt_get_offset:
>> +break;
>> +
>> +default:
>> +return -EOPNOTSUPP;
>> +}
>> +
>> +d = rcu_lock_domain_by_any_id(a.domain);
>> +
>> +if ( d == NULL )
>> +return -ESRCH;
>> +
>> +if ( !is_hvm_domain(d) )
>> +{
>> +rc = -EOPNOTSUPP;
>> +goto out;
>> +}
>> +
>> +domain_pause(d);
>> +
>> +if ( a.vcpu >= d->max_vcpus )
>> +{
>> +rc = -EINVAL;
>> +goto out;
>> +}
>> +
>> +v = d->vcpu[a.vcpu];
>> +
>> +if ( a.cmd == HVMOP_vmtrace_ipt_enable )
> 
> Please use a switch here, you might even consider re-using the switch
> from above and moving the domain checks before actually checking the
> command field, so that you don't need to perform two switches against
> a.cmd.
> 
>> +{
>> +if ( v->arch.hvm.vmx.ipt_state ) {
> 
> Coding style, brace should be on newline (there are more below which
> I'm not going to comment on).
> 
>> +// already enabled
> 
> Comments should use /* ... */, there multiple instances of this below
> which I'm not going to comment on, please check CODING_STYLE.
> 
> Also, the interface looks racy, I think you are missing a lock to
> protect v->arch.hvm.vmx.ipt_state from being freed under your feet if
> you issue concurrent calls to the interface.
> 
>> +rc = -EINVAL;
>> +goto out;
>> +}
>> +
>> +if ( a.size < PAGE_SIZE || a.size > 100 * PAGE_SIZE ) {
> 
> You can use GB(4) which is easier to read. Should the size also be a
> multiple of a PAGE_SIZE?
> 
>> +// we don't accept trace buffer size smaller than single page
>> +// and the upper bound is defined as 4GB in the specification
>> +rc = -EINVAL;
>> +goto out;
>> +}
> 
> Stray tab.
> 
>> +
>> +buf_order = get_order_from_bytes(a.size);
>> +
>> +if ( (a.size >> PAGE_SHIFT) != (1 << buf_order) ) {
> 
> Oh here is the check. I think you can move this with the checks above
> by doing a.size & ~PAGE_MASK.


I belive it's more strict than a.size & ~PAGE_MASK. I think that CPU expects 
that the buffer size is a power of 2, so you can have 64 MB or 128 MB, but not 
96 MB buffer.


> 
>&g

Re: [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit

2020-06-17 Thread Michał Leszczyński
- 17 cze 2020 o 17:14, Andrew Cooper andrew.coop...@citrix.com napisał(a):

> On 17/06/2020 13:51, Roger Pau Monné wrote:
>> On Wed, Jun 17, 2020 at 01:54:45PM +0200, Michał Leszczyński wrote:
>>> - 17 cze 2020 o 11:09, Roger Pau Monné roger@citrix.com napisał(a):
>>>
>>>> 24 Virtual Machine Control Structures -> 24.8 VM-entry Control Fields -> 
>>>> 24.8.1
>>>> VM-Entry Controls
>>>> Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS to 
>>>> determine
>>>> how it should set the reserved bits.
>>> Please look at bit position 18 "Load IA32_RTIT_CTL".
>> I think this is something different from what I was referring to.
>> Those options you refer to (load/clear IA32_RTIT_CTL) deal with
>> loading/storing a specific field on the vmcs that maps to the guest
>> IA32_RTIT_CTL.
>>
>> OTOH MSR load lists can be used to load and store any arbitrary MSR on
>> vmentry/vmexit, see section 26.4 LOADING MSRS on the SDM. There's
>> already infrastructure on Xen to do so, see vmx_{add/del/find}_msr.
> 
> If I remember the historic roadmaps correctly, there are 3 cases.
> 
> The first hardware to support PT (Broadwell?) prohibited its use
> completely in VMX operations.  In this case, we can use it to trace PV
> guests iff we don't enable VMX in hardware to begin with.
> 
> This was relaxed in later hardware (Skylake?) to permit use within VMX
> operations, but without any help in the VMCS.  (i.e. manual context
> switching per this patch, or MSR load lists as noted in the SDM.)
> 
> Subsequent support for "virtualised PT" was added (IceLake?) which adds
> the load/save controls, and the ability to translate the output buffer
> under EPT.
> 
> 
> All of this is from memory so I'm quite possibly wrong with details, but
> I believe this is why the current complexity exists.
> 
> ~Andrew


I've managed to toggle MSR_IA32_RTIT_CTL values using MSR load lists, as in:

> 35.5.2.2 Guest-Only Tracing
> "For this usage, VM-entry is programmed to enable trace packet generation, 
> while VM-exit is programmed to clear MSR_IA32_RTIT_CTL.TraceEn so as to 
> disable trace-packet generation in the host."

it actually helped a bit. With patch v1 there were parts of hypervisor recorded 
in the trace (i.e. the moment between TRACE_EN being set and actual vmenter, 
and the moment between vmexit and TRACE_EN being unset). Using MSR load list 
this was eliminated. This change will be reflected in patch v2.


I can't however implement any working scenario in which all these MSRs are 
managed using MSR load lists. As in "35.3.3 Flushing Trace Output": packets are 
buffered internally and are flushed only when TRACE_EN bit in MSR_IA32_RTIT_CTL 
is set to 0. The values of remaining registers will be stable after everything 
is serialized. I think this is too complex for the load lists alone. I belive 
that currently SDM instructs to use load lists only for toggling this single 
bit on-or-off.


Thus, for now I propose to stay with MSR_IA32_RTIT_CTL being managed by MSR 
load lists and the rest of related MSRs being managed manually.


Best regards,
Michał Leszczyński
CERT Polska



  1   2   >