RE: [RFC 0/4] Adding Virtual Memory Fuses to Xen

2022-12-20 Thread Smith, Jackson
-Original Message-
> From: Julien Grall 
> Sent: Friday, December 16, 2022 3:39 AM
>
> Hi Stefano,
>
> On 16/12/2022 01:46, Stefano Stabellini wrote:
> > On Thu, 15 Dec 2022, Julien Grall wrote:
> >>>> On 13/12/2022 19:48, Smith, Jackson wrote:
> >>> Yes, we are familiar with the "secret-free hypervisor" work. As
you
> >>> point out, both our work and the secret-free hypervisor remove the
> >>> directmap region to mitigate the risk of leaking sensitive guest
> >>> secrets. However, our work is slightly different because it
> >>> additionally prevents attackers from tricking Xen into remapping a
> guest.
> >>
> >> I understand your goal, but I don't think this is achieved (see
> >> above). You would need an entity to prevent write to TTBR0_EL2 in
> >> order to fully protect it.
> >
> > Without a way to stop Xen from reading/writing TTBR0_EL2, we
> cannot
> > claim that the guest's secrets are 100% safe.
> >
> > But the attacker would have to follow the sequence you outlines
> above
> > to change Xen's pagetables and remap guest memory before
> accessing it.
> > It is an additional obstacle for attackers that want to steal other
> guests'
> > secrets. The size of the code that the attacker would need to inject
> > in Xen would need to be bigger and more complex.
>
> Right, that's why I wrote with a bit more work. However, the nuance
> you mention doesn't seem to be present in the cover letter:
>
> "This creates what we call "Software Enclaves", ensuring that an
> adversary with arbitrary code execution in the hypervisor STILL cannot
> read/write guest memory."
>
> So if the end goal if really to protect against *all* sort of
arbitrary 
> code,
> then I think we should have a rough idea how this will look like in
Xen.
>
>  From a brief look, it doesn't look like it would be possible to
prevent
> modification to TTBR0_EL2 (even from EL3). We would need to
> investigate if there are other bits in the architecture to help us.
>
> >
> > Every little helps :-)
>
> I can see how making the life of the attacker more difficult is 
> appealing.
> Yet, the goal needs to be clarified and the risk with the approach
> acknowledged (see above).
>

You're right, we should have mentioned this weakness in our first email.
Sorry about the oversight! This is definitely still a limitation that we
have not yet overcome. However, we do think that the increase in
attacker workload that you and Stefano are discussing could still be
valuable to security conscious Xen users.

It would nice to find additional architecture features that we can use
to close this hole on arm, but there aren't any that stand out to me
either.

With this limitation in mind, what are the next steps we should take to
support this feature for the xen community? Is this increase in attacker
workload meaningful enough to justify the inclusion of VMF in Xen?

Thanks,
Jackson



smime.p7s
Description: S/MIME cryptographic signature


RE: [RFC 0/4] Adding Virtual Memory Fuses to Xen

2022-12-15 Thread Smith, Jackson
Hi Julien,

-Original Message-
From: Julien Grall 
Sent: Tuesday, December 13, 2022 3:55 PM
To: Smith, Jackson 
>
> On 13/12/2022 19:48, Smith, Jackson wrote:
> > Hi Xen Developers,
>
> Hi Jackson,
>
> Thanks for sharing the prototype with the community. Some
> questions/remarks below.
>
> > My team at Riverside Research is currently spending IRAD funding to
> > prototype next-generation secure hypervisor design ideas on Xen. In
> > particular, we are prototyping the idea of Virtual Memory Fuses for
> > Software Enclaves, as described in this paper:
> > https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note
> that that
> > paper talks about OS/Process while we have implemented the idea
> for
> > Hypervisor/VM.
> >
> > Our goal is to emulate something akin to Intel SGX or AMD SEV, but
> > using only existing virtual memory features common in all
processors.
> > The basic idea is not to map guest memory into the hypervisor so
> that
> > a compromised hypervisor cannot compromise (e.g. read/write) the
> > guest. This idea has been proposed before, however, Virtual Memory
> > Fuses go one step further; they delete the hypervisor's mappings to
> > its own page tables, essentially locking the virtual memory
> > configuration for the lifetime of the system. This creates what we
> > call "Software Enclaves", ensuring that an adversary with arbitrary
> > code execution in the hypervisor STILL cannot read/write guest
> memory.
>
> I am confused, if the attacker is able to execute arbitrary code, then
> what prevent them to write code to map/unmap the page?
>
> Skimming through the paper (pages 5-6), it looks like you would need
> to implement extra defense in Xen to be able to prevent map/unmap a
> page.
>

The key piece is deleting all virtual mappings to Xen's page table
structures. From the paper (4.4.1 last paragraph), "Because all memory
accesses operate through the MMU, even page table memory needs
corresponding page table entries in order to be written to." Without a
virtual mapping to the page table, no code can modify the page table
because it cannot read or write the table. Therefore the mappings to the
guest cannot be restored even with arbitrary code execution.

> >
> > With this technique, we protect the integrity and confidentiality of
> > guest memory. However, a compromised hypervisor can still
> read/write
> > register state during traps, or refuse to schedule a guest, denying
> > service. We also recognize that because this technique precludes
> > modifying Xen's page tables after startup, it may not be compatible
> > with all of Xen's potential use cases. On the other hand, there are
> > some uses cases (in particular statically defined embedded systems)
> > where our technique could be adopted with minimal friction.
>
>  From what you wrote, this sounds very much like the project Citrix
and
> Amazon worked on called "Secret-free hypervisor" with a twist. In your
> case, you want to prevent the hypervisor to map/unmap the guest
> memory.
>
> You can find some details in [1]. The code is x86 only, but I don't
see
> any major blocker to port it on arm64.
>

Yes, we are familiar with the "secret-free hypervisor" work. As you
point out, both our work and the secret-free hypervisor remove the
directmap region to mitigate the risk of leaking sensitive guest
secrets. However, our work is slightly different because it additionally
prevents attackers from tricking Xen into remapping a guest. 

We see our goals and the secret-free hypervisor goals as orthogonal.
While the secret-free hypervisor views guests as untrusted and wants to
keep compromised guests from leaking secrets, our work comes from the
perspective of an individual guest trying to protect its secrets from
the rest of the stack. So it wouldn't be unreasonable to say "I want a
hypervisor that is 'secret-free' and implements VMF". We see them as 
different techniques with overlapping implementations.

> >
> > With this in mind our goal is to work with the Xen community to
> > upstream this work as an optional feature. At this point, we have a
> > prototype implementation of VMF on Xen (the contents of this RFC
> patch
> > series) that supports dom0less guests on arm 64. By sharing our
> > prototype, we hope to socialize our idea, gauge interest, and
> > hopefully gain useful feedback as we work toward upstreaming.
> >
> > ** IMPLEMENTATION **
> > In our current setup we have a static configuration with dom0 and
> one
> > or two domUs. Soon after boot, Dom0 issues a hypercall through the
> > xenctrl interface to blow the fuse for the domU. In the fu

[RFC 4/4] Implement VMF for arm64

2022-12-13 Thread Smith, Jackson
Implements the functions from xen/vmf.h for arm64.
Introduces an xen/arch/arm/mm-walk.c helper file for
walking an entire page table structure.
---
 xen/arch/arm/Makefile  |   1 +
 xen/arch/arm/include/asm/mm-walk.h |  53 ++
 xen/arch/arm/include/asm/mm.h  |  11 +++
 xen/arch/arm/mm-walk.c | 181 +
 xen/arch/arm/mm.c  | 198 -
 xen/common/Kconfig |   2 +
 6 files changed, 445 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/include/asm/mm-walk.h
 create mode 100644 xen/arch/arm/mm-walk.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 4d076b2..e358452 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -37,6 +37,7 @@ obj-y += kernel.init.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += mem_access.o
 obj-y += mm.o
+obj-y += mm-walk.o
 obj-y += monitor.o
 obj-y += p2m.o
 obj-y += percpu.o
diff --git a/xen/arch/arm/include/asm/mm-walk.h 
b/xen/arch/arm/include/asm/mm-walk.h
new file mode 100644
index 000..770cc89
--- /dev/null
+++ b/xen/arch/arm/include/asm/mm-walk.h
@@ -0,0 +1,53 @@
+#ifndef __ARM_MM_WALK_H__
+#define __ARM_MM_WALK_H__
+
+#include 
+
+#define RECURSIVE_IDX ((unsigned long)(XEN_PT_LPAE_ENTRIES-1))
+#define RECURSIVE_VA (RECURSIVE_IDX << ZEROETH_SHIFT)
+
+/*
+ * Remove all mappings in these tables from Xen's address space
+ * Only makes sense if walking a guest's tables
+ */
+#define WALK_HIDE_GUEST_MAPPING (1U << 0)
+/*
+ * Remove all mappings to these tables from Xen's address space
+ * Makes sense if walking a guest's table (hide guest tables from Xen)
+ * Or if walking Xen's tables (lock Xen's virtual memory configuration)
+ */
+#define WALK_HIDE_GUEST_TABLE (1U << 1)
+
+/*
+ * Before we can hide individual table entires,
+ * we need to split the directmap superpages
+ */
+#define WALK_SPLIT_DIRECTMAP_TABLE (1U << 2)
+/*
+ * Like walk table hide, but using recursive mapping
+ * to bypass walking directmap when table is in the directmap
+ */
+#define WALK_HIDE_DIRECTMAP_TABLE (1U << 3)
+
+/* These are useful for development/debug */
+/* Show all pte's for a given address space */
+#define WALK_DUMP_ENTRIES (1U << 4)
+/* Show all mappings for a given address space */
+#define WALK_DUMP_MAPPINGS (1U << 5)
+
+/*
+ * Given the value of a ttbr register, this function walks every valid entry 
in the trie
+ * (As opposed to dump_pt_walk, which follows a single address from root to 
leaf)
+ */
+void do_walk_tables(paddr_t ttbr, int root_level, int nr_root_tables, int 
flags);
+
+#endif /*  __ARM_MM_WALK_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 68adcac..2e85885 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -209,6 +209,17 @@ extern void mmu_init_secondary_cpu(void);
  * For Arm64, map the region in the directmap area.
  */
 extern void setup_directmap_mappings(unsigned long base_mfn, unsigned long 
nr_mfns);
+/* Shatter superpages for these mfns if needed */
+extern int split_directmap_mapping(unsigned long mfn, unsigned long nr_mfns);
+/* Remove these mfns from the directmap */
+extern int destroy_directmap_mapping(unsigned long mfn, unsigned long nr_mfns);
+/*
+ * Remove this mfn from the directmap (bypassing normal update code)
+ * This is a workaround for current pgtable update code, which cannot be used
+ * to remove directmap table entries from the directmap (because they are
+ * needed to walk the directmap)
+ */
+extern void destroy_directmap_table(unsigned long mfn);
 /* Map a frame table to cover physical addresses ps through pe */
 extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
 /* map a physical range in virtual memory */
diff --git a/xen/arch/arm/mm-walk.c b/xen/arch/arm/mm-walk.c
new file mode 100644
index 000..48f9b2d
--- /dev/null
+++ b/xen/arch/arm/mm-walk.c
@@ -0,0 +1,181 @@
+/*
+ * xen/arch/arm/mm-walk.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+
+typedef struct {
+/* Keeps track of all the table offsets so we can reconstruct the VA if we 
need to */
+int off[4];
+
+/* Keeps track of root level so we can make sense of the table offsets */
+int root_level;
+int root_table_idx; /* only meaningful when nr_root_tables > 1 */
+} 

[RFC 3/4] Add xen superpage splitting support to arm

2022-12-13 Thread Smith, Jackson
Updates xen_pt_update_entry function from xen/arch/arm/mm.c to
automatically split superpages as needed.
---
 xen/arch/arm/mm.c | 91 +++
 1 file changed, 78 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 6301752..91b9c2b 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -753,8 +753,78 @@ static int create_xen_table(lpae_t *entry)
 }
 
 #define XEN_TABLE_MAP_FAILED 0
-#define XEN_TABLE_SUPER_PAGE 1
-#define XEN_TABLE_NORMAL_PAGE 2
+#define XEN_TABLE_NORMAL_PAGE 1
+
+/* More or less taken from p2m_split_superpage, without the p2m stuff */
+static bool xen_split_superpage(lpae_t *entry, unsigned int level,
+unsigned int target, const unsigned int 
*offsets)
+{
+struct page_info *page;
+lpae_t pte, *table;
+unsigned int i;
+bool rv = true;
+
+mfn_t mfn = lpae_get_mfn(*entry);
+unsigned int next_level = level + 1;
+unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level);
+
+ASSERT(level < target);
+ASSERT(lpae_is_superpage(*entry, level));
+
+page = alloc_domheap_page(NULL, 0);
+if ( !page )
+return false;
+
+table = __map_domain_page(page);
+
+/*
+ * We are either splitting a first level 1G page into 512 second level
+ * 2M pages, or a second level 2M page into 512 third level 4K pages.
+ */
+for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+{
+lpae_t *new_entry = table + i;
+
+/*
+ * Use the content of the superpage entry and override
+ * the necessary fields. So the correct permission are kept.
+ */
+pte = *entry;
+lpae_set_mfn(pte, mfn_add(mfn, i << level_order));
+
+/*
+ * First and second level pages set walk.table = 0, but third
+ * level entries set walk.table = 1.
+ */
+pte.walk.table = (next_level == 3);
+
+write_pte(new_entry, pte);
+}
+
+/*
+ * Shatter superpage in the page to the level we want to make the
+ * changes.
+ * This is done outside the loop to avoid checking the offset to
+ * know whether the entry should be shattered for every entry.
+ */
+if ( next_level != target )
+rv = xen_split_superpage(table + offsets[next_level],
+ level + 1, target, offsets);
+
+clean_dcache_va_range(table, PAGE_SIZE);
+unmap_domain_page(table);
+
+/*
+ * Generate the entry for this new table we created,
+ * and write it back in place of the superpage entry.
+ */
+pte = mfn_to_xen_entry(page_to_mfn(page), MT_NORMAL);
+pte.pt.table = 1;
+write_pte(entry, pte);
+clean_dcache(*entry);
+
+return rv;
+}
 
 /*
  * Take the currently mapped table, find the corresponding entry,
@@ -767,16 +837,15 @@ static int create_xen_table(lpae_t *entry)
  *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
  *  was empty, or allocating a new page failed.
  *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
- *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
  */
 static int xen_pt_next_level(bool read_only, unsigned int level,
- lpae_t **table, unsigned int offset)
+ lpae_t **table, const unsigned int *offsets)
 {
 lpae_t *entry;
 int ret;
 mfn_t mfn;
 
-entry = *table + offset;
+entry = *table + offsets[level];
 
 if ( !lpae_is_valid(*entry) )
 {
@@ -790,7 +859,8 @@ static int xen_pt_next_level(bool read_only, unsigned int 
level,
 
 /* The function xen_pt_next_level is never called at the 3rd level */
 if ( lpae_is_mapping(*entry, level) )
-return XEN_TABLE_SUPER_PAGE;
+/* Shatter the superpage before continuing */
+xen_split_superpage(entry, level, level + 1, offsets);
 
 mfn = lpae_get_mfn(*entry);
 
@@ -915,7 +985,7 @@ static int xen_pt_update_entry(mfn_t root, unsigned long 
virt,
 table = xen_map_table(root);
 for ( level = HYP_PT_ROOT_LEVEL; level < target; level++ )
 {
-rc = xen_pt_next_level(read_only, level, , offsets[level]);
+rc = xen_pt_next_level(read_only, level, , offsets);
 if ( rc == XEN_TABLE_MAP_FAILED )
 {
 /*
@@ -941,12 +1011,7 @@ static int xen_pt_update_entry(mfn_t root, unsigned long 
virt,
 break;
 }
 
-if ( level != target )
-{
-mm_printk("%s: Shattering superpage is not supported\n", __func__);
-rc = -EOPNOTSUPP;
-goto out;
-}
+BUG_ON( level != target );
 
 entry = table + offsets[level];
 
-- 
2.7.4




[RFC 2/4] Add VMF tool

2022-12-13 Thread Smith, Jackson
VMF tool for calling vmf_op hypercall. Eventually should be merged into xl and
related libraries.
---
 tools/Makefile |  1 +
 tools/vmf/Makefile | 32 +++
 tools/vmf/vmf.c| 65 ++
 3 files changed, 98 insertions(+)
 create mode 100644 tools/vmf/Makefile
 create mode 100644 tools/vmf/vmf.c

diff --git a/tools/Makefile b/tools/Makefile
index 7997535..ccf36a1 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -9,6 +9,7 @@ SUBDIRS-y += libs
 SUBDIRS-y += flask
 SUBDIRS-y += fuzz
 SUBDIRS-y += xenstore
+SUBDIRS-y += vmf
 SUBDIRS-y += misc
 SUBDIRS-y += examples
 SUBDIRS-y += hotplug
diff --git a/tools/vmf/Makefile b/tools/vmf/Makefile
new file mode 100644
index 000..ac5073b
--- /dev/null
+++ b/tools/vmf/Makefile
@@ -0,0 +1,32 @@
+XEN_ROOT=$(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+CFLAGS  += $(CFLAGS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenctrl)
+
+.PHONY: all
+all: build
+
+.PHONY: build
+build: vmf
+
+.PHONY: install
+install: build
+   $(INSTALL_DIR) $(DESTDIR)$(bindir)
+   $(INSTALL_PROG) vmf $(DESTDIR)$(bindir)/vmf
+
+.PHONY: uninstall
+uninstall:
+   rm -f $(DESTDIR)$(bindir)/vmf
+
+.PHONY: clean
+clean:
+   $(RM) -f $(DEPS_RM) vmf vmf.o
+
+.PHONY: distclean
+distclean: clean
+
+vmf: vmf.o Makefile
+   $(CC) $(LDFLAGS) $< -o $@ $(LDLIBS) $(APPEND_LDFLAGS)
+
+-include $(DEPS_INCLUDE)
diff --git a/tools/vmf/vmf.c b/tools/vmf/vmf.c
new file mode 100644
index 000..8b7b293
--- /dev/null
+++ b/tools/vmf/vmf.c
@@ -0,0 +1,65 @@
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+int call(unsigned int cmd, unsigned int domid)
+{
+  int ret;
+
+  xc_interface *xch = xc_interface_open(NULL, NULL, 0);
+  ret = xc_vmf_op(xch, cmd, domid);
+  xc_interface_close(xch);
+
+  return ret;
+}
+
+void help(const char *arg0)
+{
+  printf("Usage:\n");
+  printf("  %s dump\n", arg0);
+  printf("  %s info \n", arg0);
+  printf("  %s tables \n", arg0);
+  printf("  %s unmap \n", arg0);
+  printf("  %s lock\n", arg0);
+}
+
+int get_domid(const char *str) {
+  char *endptr;
+  long domid = strtol(str, , 10);
+  if (domid >= 0)
+return (int)domid;
+
+  printf("Invalid domid (%ld)\n", domid);
+  exit(1);
+}
+
+int main(int argc, const char* argv[])
+{
+  int domid;
+  if (argc == 2) {
+domid = DOMID_IDLE;
+  } else if (argc == 3) {
+domid = get_domid(argv[2]);
+  } else {
+help(argv[0]);
+return 0;
+  }
+
+#define ARG(cmd) ((strcmp(cmd, argv[1]) == 0))
+
+  if (ARG("info"))
+return call(XENVMF_dump_info, domid);
+  else if (ARG("tables"))
+return call(XENVMF_dump_tables, domid);
+  else if (ARG("unmap"))
+return call(XENVMF_unmap, domid);
+  else if (ARG("lock") && (argc == 2))
+return call(XENVMF_unmap, DOMID_IDLE);
+
+  help(argv[0]);
+  return 0;
+}
-- 
2.7.4




[RFC 1/4] Add VMF Hypercall

2022-12-13 Thread Smith, Jackson
This commit introduces a new vmf_op hypercall. If desired, could be merged
into an exisiting hypercall.

Also, introduce a VMF Kconfig option and xen/vmf.h, defining the arch specific
functions that must be implmented to support vmf.
---
 tools/include/xenctrl.h |   2 +
 tools/libs/ctrl/xc_private.c|   5 ++
 tools/libs/ctrl/xc_private.h|   5 ++
 xen/arch/x86/guest/xen/hypercall_page.S |   2 +
 xen/common/Kconfig  |   3 +
 xen/common/Makefile |   1 +
 xen/common/vmf.c| 111 
 xen/include/hypercall-defs.c|   6 ++
 xen/include/public/vmf.h|  24 +++
 xen/include/public/xen.h|   3 +
 xen/include/xen/vmf.h   |  20 ++
 11 files changed, 182 insertions(+)
 create mode 100644 xen/common/vmf.c
 create mode 100644 xen/include/public/vmf.h
 create mode 100644 xen/include/xen/vmf.h

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2303787..804ddba 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1604,6 +1604,8 @@ long xc_memory_op(xc_interface *xch, unsigned int cmd, 
void *arg, size_t len);
 
 int xc_version(xc_interface *xch, int cmd, void *arg);
 
+int xc_vmf_op(xc_interface *xch, unsigned int cmd, uint32_t domid);
+
 int xc_flask_op(xc_interface *xch, xen_flask_op_t *op);
 
 /*
diff --git a/tools/libs/ctrl/xc_private.c b/tools/libs/ctrl/xc_private.c
index 2f99a7d..44fe9ba 100644
--- a/tools/libs/ctrl/xc_private.c
+++ b/tools/libs/ctrl/xc_private.c
@@ -555,6 +555,11 @@ int xc_version(xc_interface *xch, int cmd, void *arg)
 return rc;
 }
 
+int xc_vmf_op(xc_interface *xch, unsigned int cmd, uint32_t domid)
+{
+return do_vmf_op(xch, cmd, domid);
+}
+
 unsigned long xc_make_page_below_4G(
 xc_interface *xch, uint32_t domid, unsigned long mfn)
 {
diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h
index ed960c6..fb72cb4 100644
--- a/tools/libs/ctrl/xc_private.h
+++ b/tools/libs/ctrl/xc_private.h
@@ -222,6 +222,11 @@ static inline int do_xen_version(xc_interface *xch, int 
cmd, xc_hypercall_buffer
 cmd, HYPERCALL_BUFFER_AS_ARG(dest));
 }
 
+static inline int do_vmf_op(xc_interface *xch, unsigned int cmd, uint32_t 
domid)
+{
+return xencall2(xch->xcall, __HYPERVISOR_vmf_op, cmd, domid);
+}
+
 static inline int do_physdev_op(xc_interface *xch, int cmd, void *op, size_t 
len)
 {
 int ret = -1;
diff --git a/xen/arch/x86/guest/xen/hypercall_page.S 
b/xen/arch/x86/guest/xen/hypercall_page.S
index 9958d02..2efdd58 100644
--- a/xen/arch/x86/guest/xen/hypercall_page.S
+++ b/xen/arch/x86/guest/xen/hypercall_page.S
@@ -70,6 +70,8 @@ DECLARE_HYPERCALL(arch_5)
 DECLARE_HYPERCALL(arch_6)
 DECLARE_HYPERCALL(arch_7)
 
+DECLARE_HYPERCALL(vmf_op)
+
 /*
  * Local variables:
  * tab-width: 8
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index f1ea319..3bf92b8 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -92,6 +92,9 @@ config STATIC_MEMORY
 
  If unsure, say N.
 
+config VMF
+   bool "Virtual Memory Fuse Support"
+
 menu "Speculative hardening"
 
 config INDIRECT_THUNK
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 3baf83d..fb9118d 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -48,6 +48,7 @@ obj-y += timer.o
 obj-$(CONFIG_TRACEBUFFER) += trace.o
 obj-y += version.o
 obj-y += virtual_region.o
+obj-$(CONFIG_VMF) += vmf.o
 obj-y += vm_event.o
 obj-y += vmap.o
 obj-y += vsprintf.o
diff --git a/xen/common/vmf.c b/xen/common/vmf.c
new file mode 100644
index 000..20c61d1
--- /dev/null
+++ b/xen/common/vmf.c
@@ -0,0 +1,111 @@
+/**
+ * vmf.c
+ * 
+ * Common implementation of the VMF hypercall
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+
+static void dump_domain_vcpus(struct domain *d)
+{
+struct vcpu *v;
+int i;
+
+if (d == NULL) {
+printk("NULL\n");
+return;
+}
+
+printk("Domain: %d (%d vcpus)\n", d->domain_id, d->max_vcpus);
+#if defined(CONFIG_ARM_64)
+printk("  vttbr: 0x%lx\n", d->arch.p2m.vttbr);
+#endif
+
+i = 0;
+for_each_vcpu(d, v)
+{
+printk("  vcpu [%d: id=%d, proc=%d]: \n", i++, v->vcpu_id, 
v->processor);
+/* archvcpu for arm has: */
+#if defined(CONFIG_ARM_64)
+printk(".ttbr0 is 0x%lx\n", v->arch.ttbr0);
+printk(".ttbr1 is 0x%lx\n", v->arch.ttbr1);
+#endif
+}
+}
+
+static void dump_domains(void)
+{
+struct domain *d;
+
+for_each_domain(d)
+dump_domain_vcpus(d);
+
+/* Dump system domains */
+printk("IDLE DOMAIN:\n");
+dump_domain_vcpus(idle_vcpu[0]->domain);
+printk("HARDWARE DOMAIN:\n");
+dump_domain_vcpus(hardware_domain);
+printk("XEN DOMAIN:\n");
+dump_domain_vcpus(dom_xen);
+printk("IO DOMAIN:\n");
+dump_domain_vcpus(dom_io);
+}

[RFC 0/4] Adding Virtual Memory Fuses to Xen

2022-12-13 Thread Smith, Jackson
Hi Xen Developers,

My team at Riverside Research is currently spending IRAD funding
to prototype next-generation secure hypervisor design ideas
on Xen. In particular, we are prototyping the idea of Virtual
Memory Fuses for Software Enclaves, as described in this paper:
https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note that
that paper talks about OS/Process while we have implemented the idea
for Hypervisor/VM.

Our goal is to emulate something akin to Intel SGX or AMD SEV,
but using only existing virtual memory features common in all
processors. The basic idea is not to map guest memory into the
hypervisor so that a compromised hypervisor cannot compromise
(e.g. read/write) the guest. This idea has been proposed before,
however, Virtual Memory Fuses go one step further; they delete the
hypervisor's mappings to its own page tables, essentially locking
the virtual memory configuration for the lifetime of the system. This
creates what we call "Software Enclaves", ensuring that an adversary
with arbitrary code execution in the hypervisor STILL cannot read/write
guest memory.

With this technique, we protect the integrity and confidentiality of
guest memory. However, a compromised hypervisor can still read/write
register state during traps, or refuse to schedule a guest, denying
service. We also recognize that because this technique precludes
modifying Xen's page tables after startup, it may not be compatible
with all of Xen's potential use cases. On the other hand, there are
some uses cases (in particular statically defined embedded systems)
where our technique could be adopted with minimal friction.

With this in mind our goal is to work with the Xen community to
upstream this work as an optional feature. At this point, we have
a prototype implementation of VMF on Xen (the contents of this RFC
patch series) that supports dom0less guests on arm 64. By sharing
our prototype, we hope to socialize our idea, gauge interest, and
hopefully gain useful feedback as we work toward upstreaming.

** IMPLEMENTATION **
In our current setup we have a static configuration with dom0 and
one or two domUs. Soon after boot, Dom0 issues a hypercall through
the xenctrl interface to blow the fuse for the domU. In the future,
we could also add code to support blowing the fuse automatically on
startup, before any domains are un-paused.

Our Xen/arm64 prototype creates Software Enclaves in two steps,
represented by these two functions defined in xen/vmf.h:
void vmf_unmap_guest(struct domain *d);
void vmf_lock_xen_pgtables(void);

In the first, the Xen removes mappings to the guest(s) On arm64, Xen
keeps a reference to all of guest memory in the directmap. Right now,
we simply walk all of the guest second stage tables and remove them
from the directmap, although there is probably a more elegant method
for this.

Second, the Xen removes mappings to its own page tables.
On arm64, this also involves manipulating the directmap. One challenge
here is that as we start to unmap our tables from the directmap,
we can't use the directmap to walk them. Our solution here is also
bit less elegant, we temporarily insert a recursive mapping and use
that to remove page table entries.

** LIMITATIONS and other closing thoughts **
The current Xen code has obviously been implemented under the
assumption that new pages can be mapped, and that guest virtual
addresses can be read, so this technique will break some Xen
features. However, in the general case (in particular for static
workloads where the number of guest's is not changed after boot)
we've seen that Xen rarely needs to access guest memory or adjust
its page tables.

We see a lot of potential synergy with other Xen initiatives like
Hyperlaunch for static domain allocation, or SEV support driving new
hypercall interfaces that don't require reading guest memory. These
features would allow VMF (Virtual Memory Fuses) to work with more
configurations and architectures than our current prototype, which
only supports static configurations on ARM 64.

We have not yet studied how the prototype VMF implementation impacts
performance. On the surface, there should be no significant changes.
However, cache effects from splitting the directmap superpages could
introduce a performance cost.

Additionally, there is additional latency introduced by walking all the
tables to retroactively remove guest memory. This could be optimized
by reworking the Xen code to remove the directmap. We've toyed with
the idea, but haven't attempted it yet.

Finally, our initial testing suggests that Xen never reads guest memory
(in a static, non-dom0-enchanced configuration), but have not really
explored this thoroughly.
We know at least these things work:
Dom0less virtual serial terminal
Domain scheduling
We are aware that these things currently depend on accessible guest
memory:
Some hypercalls take guest pointers as arguments
Virtualized MMIO on arm needs to decode certain load/store
   

RE: [IMAGEBUILDER PATCH] uboot-script-gen: allow fit generation with no dom0 kernel

2022-07-27 Thread Smith, Jackson
> -Original Message-
> From: Stefano Stabellini 
>
> On Mon, 25 Jul 2022, Smith, Jackson wrote:
> > Hi Stefano,
> >
> > My colleague Jason Lei and I would like to submit a patch to imagebuilder.
> >
> > It seems that generating a .fit with a true dom0less configuration fails
> because an extraneous comma is included in the its file.
> >
> > We believe this change resolves the issue.
>
> Hi Jackson, thanks for your contribution!
>
> Yes, I see the problem: the code assumes there is a dom0 kernel. If there is 
> no
> dom0 kernel then load_files will be wrongly starting with a ","
>
> I would be happy to commit your patch. I assume I can add your Signed-off-by
> to it, right?

Yes, that is fine. Could you also add Jason, as he worked this out initially?
I've added him to the message so he can give his ok.

>
> Signed-off-by: Jackson Smith 
>
> Signed-off-by is the "Developer Certificate of Origin" which means:
> https://developercertificate.org/
>
>
>
> >
> > Remove extraneous comma in generated its file when no DOM0_KERNEL is
> specified.
> >
> > diff --git a/scripts/uboot-script-gen b/scripts/uboot-script-gen index
> > 8f08cd6..6f94fce 100755
> > --- a/scripts/uboot-script-gen
> > +++ b/scripts/uboot-script-gen
> > @@ -676,7 +676,12 @@ create_its_file_xen()
> >  i=$(( $i + 1 ))
> >  continue
> >  fi
> > -load_files+=", \"domU${i}_kernel\""
> > +   if test -z "$load_files"
> > +   then
> > +   load_files+="\"domU${i}_kernel\""
> > +   else
> > +   load_files+=", \"domU${i}_kernel\""
> > +   fi
> >  cat >> "$its_file" <<- EOF
> >  domU${i}_kernel {
> >  description = "domU${i} kernel binary";
> >
> >



[IMAGEBUILDER PATCH] uboot-script-gen: allow fit generation with no dom0 kernel

2022-07-25 Thread Smith, Jackson
Hi Stefano,

My colleague Jason Lei and I would like to submit a patch to imagebuilder.

It seems that generating a .fit with a true dom0less configuration fails 
because an extraneous comma is included in the its file.

We believe this change resolves the issue.

Thanks,
Jackson

-- >8 --

Remove extraneous comma in generated its file when no DOM0_KERNEL is specified.

diff --git a/scripts/uboot-script-gen b/scripts/uboot-script-gen
index 8f08cd6..6f94fce 100755
--- a/scripts/uboot-script-gen
+++ b/scripts/uboot-script-gen
@@ -676,7 +676,12 @@ create_its_file_xen()
 i=$(( $i + 1 ))
 continue
 fi
-load_files+=", \"domU${i}_kernel\""
+   if test -z "$load_files"
+   then
+   load_files+="\"domU${i}_kernel\""
+   else
+   load_files+=", \"domU${i}_kernel\""
+   fi
 cat >> "$its_file" <<- EOF
 domU${i}_kernel {
 description = "domU${i} kernel binary";





RE: [PATCH v1 10/18] x86: introduce the domain builder

2022-07-22 Thread Smith, Jackson
> -Original Message-
> From: Daniel P. Smith 
>
> On 7/18/22 09:59, Smith, Jackson wrote:
> > Hi Daniel,
> >
> >> -Original Message-
> >> Subject: [PATCH v1 10/18] x86: introduce the domain builder
> >>
> >> This commit introduces the domain builder configuration FDT parser
> >> along with the domain builder core for domain creation. To enable
> >> domain builder to be a cross architecture internal API, a new arch
> >> domain creation call
> > is
> >> introduced for use by the domain builder.
> >
> >> diff --git a/xen/common/domain-builder/core.c
> >
> >> +void __init builder_init(struct boot_info *info) {
> >> +struct boot_domain *d = NULL;
> >> +
> >> +info->builder = 
> >> +
> >> +if ( IS_ENABLED(CONFIG_BUILDER_FDT) )
> >> +{
> >
> >> +}
> >> +
> >> +/*
> >> + * No FDT config support or an FDT wasn't present, do an initial
> >> + * domain construction
> >> + */
> >> +printk("Domain Builder: falling back to initial domain build\n");
> >> +info->builder->nr_doms = 1;
> >> +d = >builder->domains[0];
> >> +
> >> +d->mode = opt_dom0_pvh ? 0 : BUILD_MODE_PARAVIRTUALIZED;
> >> +
> >> +d->kernel = >mods[0];
> >> +d->kernel->kind = BOOTMOD_KERNEL;
> >> +
> >> +d->permissions = BUILD_PERMISSION_CONTROL |
> >> BUILD_PERMISSION_HARDWARE;
> >> +d->functions = BUILD_FUNCTION_CONSOLE |
> >> BUILD_FUNCTION_XENSTORE |
> >> + BUILD_FUNCTION_INITIAL_DOM;
> >> +
> >> +d->kernel->arch->headroom = bzimage_headroom(bootstrap_map(d-
> >>> kernel),
> >> +   d->kernel->size);
> >> +bootstrap_map(NULL);
> >> +
> >> +if ( d->kernel->string.len )
> >> +d->kernel->string.kind = BOOTSTR_CMDLINE; }
> >
> > Forgive me if I'm incorrect, but I believe there is an issue with this
> > fallback logic for the case where no FDT was provided.
>
> IIUC, the issue at hand has to deal with patch #15.
>
> > If dom0_mem is not supplied to the xen cmd line, then d->meminfo is
> > never initialized. (See dom0_compute_nr_pages/dom0_build.c:335)
> > This was giving me trouble because bd->meminfo.mem_max.nr_pages was
> > left at 0, effectivity clamping dom0 to 0 pages of ram.
> >

I realize I never shared the exact panic message I was experiencing. Sorry 
about that.
It's "Domain 0 allocation is too small for kernel image" on 
xen/arch/x86/pv/domain_builder.c:534

I think you should be able to consistently reproduce what I'm seeing as long as 
these two conditions are met:
- the dom0_mem cmdline option is _not_ set
- no domain builder device tree is passed to xen (the fallback case I 
identified above)

> > I'm not sure what the best solution is but one (easy) possibility is
> > just initializing meminfo to the dom0 defaults near the end of this 
> > function:
> > d->meminfo.mem_size = dom0_size;
> > d->meminfo.mem_min = dom0_min_size;
> > d->meminfo.mem_max = dom0_max_size;
>
> I believe the correct fix is to this hunk,
>
> @@ -416,7 +379,12 @@ unsigned long __init dom0_compute_nr_pages(
>  }
>  }
>
> -d->max_pages = min_t(unsigned long, max_pages, UINT_MAX);
> +/* Clamp according to min/max limits and available memory (final). */
> +nr_pages = max(nr_pages, min_pages);
> +nr_pages = min(nr_pages, max_pages);
> +nr_pages = min(nr_pages, avail);
> +
> +bd->domain->max_pages = min_t(unsigned long, max_pages, UINT_MAX);
>
> Before that last line, there should be a clamp up of max_pages, e.g.
>
> nr_pages = max(nr_pages, min_pages);
> nr_pages = min(nr_pages, max_pages);
> nr_pages = min(nr_pages, avail);
>
> max_pages = max(nr_pages, max_pages);
>
> bd->domain->max_pages = min_t(unsigned long, max_pages, UINT_MAX);
>
> v/r,
> dps

I don't believe this resolves my issue.

If max_pages is 0 before these 5 lines, then the second line will still clamp 
nr_pages to 0 and the panic on line 534 will be hit.

Before patch 15, this max limit came directly from dom0_max_size, which has a 
default value of { .nr_pages = LONG_MAX }, so no clamping will occur unless 
overridden by the cmd line.

After patch 15, bd->meminfo.mem_max is used as the max limit. (unless 
overridden by the cmdline)
I'

RE: [PATCH v1 00/18] Hyperlaunch

2022-07-19 Thread Smith, Jackson
Hi Daniel,

> -Original Message-
> Subject: [PATCH v1 00/18] Hyperlaunch

With the adjustments that I suggested in other messages, this patch builds and 
boots for me on x86 (including a device tree with a domU). I will continue to 
poke around and see if I discover any other rough edges.

One strange behavior I see is that xen fails to start the Dom0 kernel on a warm 
reboot. I'm using qemu_system_x86 with the KVM backend to test out the patch. 
After starting qemu, xen will boot correctly only once. If I attempt to reboot 
the virtual system (through the 'reboot' command in dom0 or the 'system_reset' 
qemu monitor command) without exiting/starting a new qemu process on the host 
machine, xen panics while booting after printing this:

(XEN) *** Building Dom0 ***
(XEN) Dom0 has maximum 856 PIRQs
(XEN) *** Constructing a PV Dom0 ***
(XEN) ELF: not an ELF binary
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Could not construct domain 0
(XEN) 

This happens with the BUILDER_FDT config option on and off, and regardless of 
what dtb (if any) I pass to xen. I don't see this behavior if I switch back to 
xen's master branch.

Hopefully that explanation made sense. Let me know if I can provide any further 
information about my setup.

Thanks,
Jackson

Also, I apologize that my last messages included a digital signature. Should be 
fixed now.



RE: [PATCH v1 10/18] x86: introduce the domain builder

2022-07-18 Thread Smith, Jackson
Hi Daniel,

> -Original Message-
> Subject: [PATCH v1 10/18] x86: introduce the domain builder
> 
> This commit introduces the domain builder configuration FDT parser along
> with the domain builder core for domain creation. To enable domain builder
> to be a cross architecture internal API, a new arch domain creation call
is
> introduced for use by the domain builder.

> diff --git a/xen/common/domain-builder/core.c

> +void __init builder_init(struct boot_info *info) {
> +struct boot_domain *d = NULL;
> +
> +info->builder = 
> +
> +if ( IS_ENABLED(CONFIG_BUILDER_FDT) )
> +{

> +}
> +
> +/*
> + * No FDT config support or an FDT wasn't present, do an initial
> + * domain construction
> + */
> +printk("Domain Builder: falling back to initial domain build\n");
> +info->builder->nr_doms = 1;
> +d = >builder->domains[0];
> +
> +d->mode = opt_dom0_pvh ? 0 : BUILD_MODE_PARAVIRTUALIZED;
> +
> +d->kernel = >mods[0];
> +d->kernel->kind = BOOTMOD_KERNEL;
> +
> +d->permissions = BUILD_PERMISSION_CONTROL |
> BUILD_PERMISSION_HARDWARE;
> +d->functions = BUILD_FUNCTION_CONSOLE |
> BUILD_FUNCTION_XENSTORE |
> + BUILD_FUNCTION_INITIAL_DOM;
> +
> +d->kernel->arch->headroom = bzimage_headroom(bootstrap_map(d-
> >kernel),
> +   d->kernel->size);
> +bootstrap_map(NULL);
> +
> +if ( d->kernel->string.len )
> +d->kernel->string.kind = BOOTSTR_CMDLINE; }

Forgive me if I'm incorrect, but I believe there is an issue with this
fallback logic for the case where no FDT was provided.

If dom0_mem is not supplied to the xen cmd line, then d->meminfo is never
initialized. (See dom0_compute_nr_pages/dom0_build.c:335)
This was giving me trouble because bd->meminfo.mem_max.nr_pages was left at
0, effectivity clamping dom0 to 0 pages of ram.

I'm not sure what the best solution is but one (easy) possibility is just
initializing meminfo to the dom0 defaults near the end of this function:
d->meminfo.mem_size = dom0_size;
d->meminfo.mem_min = dom0_min_size;
d->meminfo.mem_max = dom0_max_size;

Thanks,
Jackson


smime.p7s
Description: S/MIME cryptographic signature


RE: [PATCH v1 04/18] x86: refactor entrypoints to new boot info

2022-07-18 Thread Smith, Jackson
Hi Daniel,

I hope outlook gets this reply right.

> -Original Message-
> Subject: [PATCH v1 04/18] x86: refactor entrypoints to new boot info

> diff --git a/xen/arch/x86/guest/xen/pvh-boot.c
> b/xen/arch/x86/guest/xen/pvh-boot.c
> index 834b1ad16b..28cf5df0a3 100644
> --- a/xen/arch/x86/guest/xen/pvh-boot.c
> +++ b/xen/arch/x86/guest/xen/pvh-boot.c

> @@ -99,13 +118,16 @@ static void __init get_memory_map(void)
>  sanitize_e820_map(e820_raw.map, _raw.nr_map);
>  }
>
> -void __init pvh_init(multiboot_info_t **mbi, module_t **mod)
> +void __init pvh_init(struct boot_info **bi)
>  {
> -convert_pvh_info(mbi, mod);
> +*bi = init_pvh_info();
> +convert_pvh_info(*bi);
>
>  hypervisor_probe();
>  ASSERT(xen_guest);
>
> +(*bi)->arch->xen_guest = xen_guest;

I think you may have a typo/missed refactoring here?
I changed this line to "(*bi)->arch->xenguest = xen_guest;" to get the 
patchset to build.

The arch_boot_info struct in boot_info32.h has a field 'xen_guest' but the 
same field in asm/bootinfo.h was re-named from 'xen_guest' to 'xenguest' in 
the 'x86: adopt new boot info structures' commit.

What was your intent?

> +
>  get_memory_map();
>  }
>

Thanks,
Jackson Smith


smime.p7s
Description: S/MIME cryptographic signature


RE: [PATCH v1 07/18] docs: update hyperlaunch device tree documentation

2022-07-18 Thread Smith, Jackson
Hi Daniel,

> -Original Message-
> Subject: [PATCH v1 07/18] docs: update hyperlaunch device tree
> documentation


> diff --git a/docs/designs/launch/hyperlaunch-devicetree.rst
> b/docs/designs/launch/hyperlaunch-devicetree.rst
> index b49c98cfbd..ae1a786d0b 100644
> --- a/docs/designs/launch/hyperlaunch-devicetree.rst
> +++ b/docs/designs/launch/hyperlaunch-devicetree.rst
> @@ -13,12 +13,268 @@ difference is the introduction of the ``hypervisor``

> +
> +The Hypervisor node
> +---
> +
> +The ``hypervisor`` node is a top level container for the domains that
> +will be
> built
> +by hypervisor on start up. The node will be named ``hypervisor``  with
> +a
> ``compatible``
> +property to identify which hypervisors the configuration is intended.
^^^ Should there be a note here that hypervisor node also needs a compatible 
"xen,"?

> +The
> hypervisor
> +node will consist of one or more config nodes and one or more domain
> nodes.
> +
> +Properties
> +""
> +
> +compatible
> +  Identifies which hypervisors the configuration is compatible. Required.
> +
> +  Format: "hypervisor,", e.g "hypervisor,xen"
^^^ Same here: compatible ","?

>  Example Configuration
>  -
> +
> +Multiboot x86 Configuration Dom0-only:
> +""
> +The following dts file can be provided to the Device Tree compiler,
> +``dtc``,
> to
> +produce a dtb file.
> +::
> +
> +  /dts-v1/;
> +
> +  / {
> +  chosen {
> +  hypervisor {
> +  compatible = "hypervisor,xen";
  compatible = "hypervisor,xen", "xen,x86";

> +
> +  dom0 {
> +  compatible = "xen,domain";
> +
> +  domid = <0>;
> +
> +  permissions = <3>;
> +  functions = <0xC00F>;
> +  mode = <5>;
> +
> +  domain-uuid = [B3 FB 98 FB 8F 9F 67 A3 8A 6E 62 5A 09
> + 13 F0
> 8C];
> +
> +  cpus = <1>;
> +  memory = <0x0 0x2000>;
^^ memory = "2048M";
Needs to be updated to new format for mem.

> +
> +  kernel {
> +  compatible = "module,kernel", "module,index";
> +  module-index = <1>;
> +  };
> +  };
> +
> +  };
> +  };
> +  };
> +

Similar adjustments are needed for the rest of the examples I believe.

Also, two typos:
Line 287 is missing a line ending semi-colon.
Line 82 has a double space between 'node' and 'may'.

Best,
Jackson


smime.p7s
Description: S/MIME cryptographic signature