Hi all, it was requested I send the notes I took during the design discussion on the ABI / APIs to the list.
Normally I keep this as personal notes, so there may be errors (esp if I did not hear correctly), so please feel free to correct or expand. Details may be missing where I am unaware of the history behind something. -Alex Merritt Design Discussion: Xen ABIs and APIs - Chris on remote: Andrew has been and wants to work on a new ABI - Andrew: put together a collection of documents to understand what we have to work with, what we want to improve, before starting the work on any design or iterations on interfaces we currently have - link to document in the design session ‐ https://design-sessions.xenproject.org/uid/discussion/disc_3IEQbyaCTkqLf2fFzoze/view - number of things we have been aware of for a while - some attempts to address them on the list - one problem: if you only try to fix one of them, it brings in discussion of fixing many other items - everyone has opinion on what the end result will look like - existing designs only fix subsets, not the whole thing - we want to address all the problems from the start, before deciding on a plan to fix them - enumerate the ABIs and APIs that currently exist ‐ problems not apparent if you just think about this ‐ many folks think this is just the hypercalls ‐ there is the enumeration information ‐ xen has many bugs - originally monorepo with xen, linux, qemu, BSDs, bochs, ... with “make world” you got a system. All guests were required to have event channel - no discovery exists because they all had it ‐ grant table v2, migrate old version of xen to new, exercise new code paths, then kernel crashed ‐ initial state of vcpus - many folks don’t think about them, but what xen presents, we have bugs describing those via the hypercalls we use ‐ the hypercalls themselves -- 46? -- half of them specific for PV guests - x86 HVM / ARM HVM are only a small fraction of the total hypercalls that exist - the reason the hypercalls look like this now, Xen started with pv guests on x86, a VAS system made sense ‐ when HVM guests came along, we have hacks fitting PV guests into HVM ‐ Xen has to walk the page tables of the guest just to get the information it needs, you cannot do that in encrypted VMs by design ‐ need to change the way we deal with pointers in the API - evtchn send, pass pointer information on the stack ‐ get interrupt for someone else! - look over all APIs and ABIs that exist because they have different problems in different areas - XenServer cares most about right now host UEFI secure boot ‐ new priv boundary that does not exist previously ‐ admin with root cannot (should not) violate security boundary, cannot read/write arbitrary memroy ‐ hypercalls: open /dev/xen/privcmd and pointers into user space memory, nothing stops passing kernel pointer memory - giant privilege escalation hole in UEFI secure boot - root user space is not priv enough to execute arbitrary code ‐ all problems compound, thus we want to look at all of them before we start figuring out what to do - another example: being based on x86 originally, large hypercalls have a shift by 12, assume 4k pages, problem with ARM wanting 64k page tables ‐ event the data layout wants to change - if you change the version of Xen, you break the user space (library versions) ‐ was intentional choice early on, doesn’t scale ‐ get rid of unstable APIs -- killing xen - security hotfix - recompile QEMU ‐ ABI rules say any change in hypervisor, thus rebuilding user space, and QEMU -- anything that links against the xen packages! - Bertrand: look at problem yesterday: how we create and configure a guest, coherency to reach dom0less ‐ twice code to create a guest, duplicated code ‐ duplicate configuration format ‐ if we modify ABI between dom0 and Xen, need to look at have coherent format so we can reuse the same code - Alex M: can we hide hypercalls via libraries? ‐ yes but currently the versions for a break ‐ definitely an option forward ‐ still doesn’t solve the issue, because other libraries in other languages won’t be shielded from unstable ABIs - Jan: both knowing what to do and where we go is useful ‐ Andrew: have to have broad idea where to go.... - Jan: carrying out hypercall is independent of the mechansim we define ‐ Andrew: still needs backwards compatibility ‐ Andrew: use higher op numbers - Alex M: is our problem unique to us? ‐ Andrew: we have enough corner cases that yes ‐ Bertrand: PV guests require a large number of hypercalls ‐ Jan: keep VA for PV hypercalls - Rich on call: work together with Chris to write down something difficult in scope ‐ any work written down, useful for folks on other side where we may encounter failures ‐ newcomers: xen forked by HP (?) ‐ everyone tried to narrow to verticle markets, focus on specific markets ‐ Xen: is last entity standing, still trying to pull all stakeholders together, but not sure how long it will last ‐ if collapses: accidental or intentional interoperability, carve out the pieces so that the ppl at table today have a chance to know what results from it ‐ what will last longest: certified entities that have long lifecycles, decades or more ‐ certified snapshots will become longest lived design choices - Andrew: shared info page ‐ layout was done with unsigned longs which changed sizes ‐ layout of the shared info page changes ‐ different vcpus can be in different modes at a time ‐ we cache the mode of the cpu at the point which it makes one of two types of hypercalls - another design session tomorrow