Re: [Xen-devel] [PATCH v2 1/6] docs/qemu-deprivilege: Revise and update with status and future plans

2018-09-28 Thread Anthony PERARD
On Fri, Sep 28, 2018 at 12:37:22PM +0100, George Dunlap wrote:
> On Tue, Sep 25, 2018 at 12:20 PM Anthony PERARD
>  wrote:
> >
> > On Fri, Sep 21, 2018 at 06:04:23PM +0100, George Dunlap wrote:
> > > +## Migration
> > > +
> > > +When calling xen-save-devices-state, since QEMU is running in a chroot
> > > +it is not useful to pass a filename (it doesn't even have write access
> > > +inside the chroot). Instead, give it an open fd using the add-fd
> > > +mechanism.
> >
> > That describe an issue only on save. The restore part of a migration
> > also have an issue:
> >
> > On restore, QEMU signal to libxl that it is ready only after priviledge
> > restriction have been put in place (and after or when it receive the
> > migration data). But xenstore isn't available anymore, so restore fails
> > from libxl point-of-view.
> >
> >
> > Or maybe you describe here an issue that would arise when an hypothetic
> > chroot is apply. But the migration issue describe here still apply
> > without chroot and with only -runas, the path libxl give to QEMU is
> > accessible by root only.
> 
> Do our other restrictions -- chroot, deprivilege,  -- all work on
> restore at the moment?

chroot isn't an issue for restore. But running QEMU as non-root user is
currently an issue for restore.

I think the only issue with restore that doesn't also exist when
creating a guest is restricted access to xenstore.

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] docs/qemu-deprivilege: Revise and update with status and future plans

2018-09-28 Thread George Dunlap
On Tue, Sep 25, 2018 at 12:20 PM Anthony PERARD
 wrote:
>
> On Fri, Sep 21, 2018 at 06:04:23PM +0100, George Dunlap wrote:
> > +## Migration
> > +
> > +When calling xen-save-devices-state, since QEMU is running in a chroot
> > +it is not useful to pass a filename (it doesn't even have write access
> > +inside the chroot). Instead, give it an open fd using the add-fd
> > +mechanism.
>
> That describe an issue only on save. The restore part of a migration
> also have an issue:
>
> On restore, QEMU signal to libxl that it is ready only after priviledge
> restriction have been put in place (and after or when it receive the
> migration data). But xenstore isn't available anymore, so restore fails
> from libxl point-of-view.
>
>
> Or maybe you describe here an issue that would arise when an hypothetic
> chroot is apply. But the migration issue describe here still apply
> without chroot and with only -runas, the path libxl give to QEMU is
> accessible by root only.

Do our other restrictions -- chroot, deprivilege,  -- all work on
restore at the moment?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] docs/qemu-deprivilege: Revise and update with status and future plans

2018-09-25 Thread Anthony PERARD
On Fri, Sep 21, 2018 at 06:04:23PM +0100, George Dunlap wrote:
> +## Migration
> +
> +When calling xen-save-devices-state, since QEMU is running in a chroot
> +it is not useful to pass a filename (it doesn't even have write access
> +inside the chroot). Instead, give it an open fd using the add-fd
> +mechanism.

That describe an issue only on save. The restore part of a migration
also have an issue:

On restore, QEMU signal to libxl that it is ready only after priviledge
restriction have been put in place (and after or when it receive the
migration data). But xenstore isn't available anymore, so restore fails
from libxl point-of-view.


Or maybe you describe here an issue that would arise when an hypothetic
chroot is apply. But the migration issue describe here still apply
without chroot and with only -runas, the path libxl give to QEMU is
accessible by root only.


> diff --git a/docs/features/qemu-deprivilege.pandoc 
> b/docs/features/qemu-deprivilege.pandoc
> new file mode 100644
> index 00..b377076606
> --- /dev/null
> +++ b/docs/features/qemu-deprivilege.pandoc
[...]
> +# Technical details
> +
> +See docs/design/qemu-deprivilege.txt for technical details.

Right now, this doc have the ".md" suffix, not ".txt", I don't know if
that matter.


-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] docs/qemu-deprivilege: Revise and update with status and future plans

2018-09-25 Thread Ian Jackson
George Dunlap writes ("Re: [PATCH v2 1/6] docs/qemu-deprivilege: Revise and 
update with status and future plans"):
> On 09/24/2018 11:23 AM, Ian Jackson wrote:
> > You also need the tool `fishdescriptor' from src:chiark-utils to get
> > the descriptors out of qemu.  It is in chiark-utils-bin in Debian
> > buster and Debian stretch-backports.
> 
> This was meant to be a somewhat technical description of the mechanism
> of doing the testing (to be implemented by someone implementing the
> feature), rather than a how-to for keen users / testers to actually run
> the test.  What about:

Right.

> "Use `fishdescriptor` from [reference], to pull a file descriptor from a
> running QEMU, then check that it has the desired properties, and that
> hypercalls which are meant to fail do fail.  (The latter is implemented
> in `tools/test/depriv/depriv-fd-checker.c`)."

SGTM.  `[reference]' is `(in Debian this can be found in the binary
package chiark-scripts)'.

> On a related note: Is there any reason not to  move
> osstest-depriv-fd-collector into the tree, perhaps even merging it with
> the functionality in depriv-process-checker?

I would be entirely fine with that.  Its osstest-specific knowledge is
fairly limited.

> >> +## Namespaces for unused functionality (Linux only)
...
> >> +unshare(CLONE_NEWNS | CLONE_NEWIPC)
...
> >> +unshare(CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWIPC)
...
> > The CLONE_NEWIPC overlaps with the IPC unshare discussed above.
> 
> This is the second time I've had to try to explain the difference
> between the above two items; I'm not sure what's not clear about what
> was written.

Your two unshare calls have overlapping flags.  That means they have
overlapping effects.  I think I saw that and then conflated the two
completely.  Sorry about that - but really, it doesn't make sense to
list two unshare calls with overlapping flags as if they were
separate.

> > If you are recording this kind of information here: this will of
> > course not work, because qemu binds and opens things at startup that
> > would be broken by this.  Maybe you want to give a url to a mailing
> > list posting instead of this un-referenced hearsay.
>
> The title of the first says: "...for unused functionaltiy".  IPC
> namespaces are for non-file-based IPCs (i.e., things which are not unix
> sockets).  QEMU does not use this functionality, nor does it use mount
> functionality.  The first restriction is in fact implemented in patch 4,
> and I haven't had any issues with it.

I think I was thinkng of CLONE_NEWNET when I wrote the words above.
Sorry for the confusion.

> >> +## Setting up a userid range
> > 
> > There was some discussion on a Debian list recently about some
> > container systems that encode a 16-bit within-container uid and a
> > 16-bit container number into the 32-bit uid.  I guess we don't need to
> > explicitly worry about clashes between our usage and those ?
> 
> Hmm, someone may run containers that use such things in dom0, at which
> point we may have a namespace collision.
> 
> But really I think this is a distro problem to solve -- we don't specify
> a >16-bit UID, we just give it as an example.  Debian could, for instance:
>  - Not use the system which uses the 16/16 split
>  - Enforce that Xen and the 16/16 split system are not installed at the
> same time
>  - Reserve 32k of UIDs in the 16-bit space somehow
>  - Reserve one of the "container ID" entries for Xen, so that there's
> never a collision

In practice even in Debian things are not this organised.  Currently
Debian's policy docs simply say that uids >2^16 are "reserved".  Work
will need to be done there to formalise this.

> > The limitations section should also say something like this:
> > 
> >  The currently implemented restrictions are thought to be a useful
> >  security improvement.  However, the design and implementation is
> >  preliminary and there is work left to do.  Accordingly we do not
> >  promise that they are sufficient to stop a rogue domain which takes
> >  control of its qemu from escaping into the host, let alone stop it
> >  from denying service to the host.
> 
> Isn't this what "Tech preview" means?  Or do you mean we'll keep this
> kind of warning in after we take it out of 'tech preview'?
> 
> >  Therefore, bugs which affect the effectiveness of the qemu depriv
> >  mechanisms will be treated as plain bugs, not security bugs; they
> >  would not result in a Xen Project Security Advisory.  However, bugs
> >  where the security of a system with dm_restrict=1 is worse than
> >  before, will be treated as security bugs.
> 
> This would be slightly different than 'tech preview'.

Both of these paragraphs are to be taken together.  And yes, it is
somewhat different than pure "tech preview", which is "if you use this
feature, all bets are off".  I am trying to write a sensible promise
which means that someone who turns on dm_restrict is not *weakening*
their security posture.

> Once this goes to "supported", I 

Re: [Xen-devel] [PATCH v2 1/6] docs/qemu-deprivilege: Revise and update with status and future plans

2018-09-24 Thread George Dunlap
On 09/24/2018 11:23 AM, Ian Jackson wrote:
> George Dunlap writes ("[PATCH v2 1/6] docs/qemu-deprivilege: Revise and 
> update with status and future plans"):
>> +## Xen library / file-descriptor restrictions
>> +
>> +'''Description''': Close and restrict Xen-related file descriptors.
>> +Specifically:
>> + * Close all xenstore-related file descriptors
> 
> This is correct.
> 
>> + * Make sure that extraneous `privcmd` and `evtchn` instances are
>> +closed
> 
> No, *all* privcmd and evtchn instances are restricted, even
> `extraneous' ones which have been leaked by qemu.  None are closed.

Ack.

>> +'''How to test''':
>> +
>> +tools/test/depriv/depriv-fd-checker.c
> 
> You also need the tool `fishdescriptor' from src:chiark-utils to get
> the descriptors out of qemu.  It is in chiark-utils-bin in Debian
> buster and Debian stretch-backports.

This was meant to be a somewhat technical description of the mechanism
of doing the testing (to be implemented by someone implementing the
feature), rather than a how-to for keen users / testers to actually run
the test.  What about:

"Use `fishdescriptor` from [reference], to pull a file descriptor from a
running QEMU, then check that it has the desired properties, and that
hypercalls which are meant to fail do fail.  (The latter is implemented
in `tools/test/depriv/depriv-fd-checker.c`)."

On a related note: Is there any reason not to  move
osstest-depriv-fd-collector into the tree, perhaps even merging it with
the functionality in depriv-process-checker?

>> +## Namespaces for unused functionality (Linux only)
>> +
>> +'''Description''': Enter QEMU into its own mount & IPC namespaces.
>> +This means that even if other restrictions fail, the process won't be
>> +able to even name system mount points or exsting non-file-based IPC
>> +descriptors to attempt to attack them.
>> +
>> +'''Implementation''':
>> +
>> +In theory this could be done in QEMU (similar to -sandbox, -runas,
>> +-chroot, and so on), but a patch doing this in QEMU was NAKed
>> +upstream. They preferred that this was done as a setup step by
>> +whatever executes QEMU; i.e., have the process which exec's QEMU first
>> +call:
>> +
>> +unshare(CLONE_NEWNS | CLONE_NEWIPC)
> 
> If you are recording this kind of information here: this will of
> course not work, because qemu binds and opens things at startup that
> would be broken by this.  Maybe you want to give a url to a mailing
> list posting instead of this un-referenced hearsay.
> 
>> +### Network namespacing (Linux only)
>> +
>> +Enter QEMU into its own network namespace (in addition to mount & IPC
>> +namespaces).  Basically change the 'unshare' call to be as follows:
>> +
>> +unshare(CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWIPC)
>> +
>> +QEMU does actually use the network namespace by default, so adding
>> +this restriction requires additional changes, listed below.
> 
> The CLONE_NEWIPC overlaps with the IPC unshare discussed above.

This is the second time I've had to try to explain the difference
between the above two items; I'm not sure what's not clear about what
was written.

The title of the first says: "...for unused functionaltiy".  IPC
namespaces are for non-file-based IPCs (i.e., things which are not unix
sockets).  QEMU does not use this functionality, nor does it use mount
functionality.  The first restriction is in fact implemented in patch 4,
and I haven't had any issues with it.

I suppose I need more text like the above, explicitly defining what
those namespaces do, and that QEMU doesn't need them?

The second section explicitly mentions the fact that this will not work,
and references further changes which would be required for such a change
to be implemented (in the 'Network' and 'VNC' sections).

I can certainly provide a reference if you think that's important.

Any other suggestions for clarification, so that we don't have to repeat
this discussion again, would be welcome.

>> +## Setting up a userid range
> 
> There was some discussion on a Debian list recently about some
> container systems that encode a 16-bit within-container uid and a
> 16-bit container number into the 32-bit uid.  I guess we don't need to
> explicitly worry about clashes between our usage and those ?

Hmm, someone may run containers that use such things in dom0, at which
point we may have a namespace collision.

But really I think this is a distro problem to solve -- we don't specify
a >16-bit UID, we just give it as an example.  Debian could, for instance:
 - Not use the system which uses the 16/16 split
 - Enforce that Xen and the 16/16 split system are not installed at the
same time
 - Reserve 32k of UIDs in the 16-bit space somehow
 - Reserve one of the "container ID" entries for Xen, so that there's
never a collision

>> +# Limitations
>> +
>> +The following features still need to be implemented:
>> + * Inserting a new cdrom while the guest is running (xl cdrom-insert)
>> + * Migration / save / restore
>> +
>> +Additionally, getting PCI 

Re: [Xen-devel] [PATCH v2 1/6] docs/qemu-deprivilege: Revise and update with status and future plans

2018-09-24 Thread Ian Jackson
George Dunlap writes ("[PATCH v2 1/6] docs/qemu-deprivilege: Revise and update 
with status and future plans"):
> +## Xen library / file-descriptor restrictions
> +
> +'''Description''': Close and restrict Xen-related file descriptors.
> +Specifically:
> + * Close all xenstore-related file descriptors

This is correct.

> + * Make sure that extraneous `privcmd` and `evtchn` instances are
> +closed

No, *all* privcmd and evtchn instances are restricted, even
`extraneous' ones which have been leaked by qemu.  None are closed.

> +'''How to test''':
> +
> +tools/test/depriv/depriv-fd-checker.c

You also need the tool `fishdescriptor' from src:chiark-utils to get
the descriptors out of qemu.  It is in chiark-utils-bin in Debian
buster and Debian stretch-backports.

> +## Namespaces for unused functionality (Linux only)
> +
> +'''Description''': Enter QEMU into its own mount & IPC namespaces.
> +This means that even if other restrictions fail, the process won't be
> +able to even name system mount points or exsting non-file-based IPC
> +descriptors to attempt to attack them.
> +
> +'''Implementation''':
> +
> +In theory this could be done in QEMU (similar to -sandbox, -runas,
> +-chroot, and so on), but a patch doing this in QEMU was NAKed
> +upstream. They preferred that this was done as a setup step by
> +whatever executes QEMU; i.e., have the process which exec's QEMU first
> +call:
> +
> +unshare(CLONE_NEWNS | CLONE_NEWIPC)

If you are recording this kind of information here: this will of
course not work, because qemu binds and opens things at startup that
would be broken by this.  Maybe you want to give a url to a mailing
list posting instead of this un-referenced hearsay.

> +### Network namespacing (Linux only)
> +
> +Enter QEMU into its own network namespace (in addition to mount & IPC
> +namespaces).  Basically change the 'unshare' call to be as follows:
> +
> +unshare(CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWIPC)
> +
> +QEMU does actually use the network namespace by default, so adding
> +this restriction requires additional changes, listed below.

The CLONE_NEWIPC overlaps with the IPC unshare discussed above.

> +## Setting up a userid range

There was some discussion on a Debian list recently about some
container systems that encode a 16-bit within-container uid and a
16-bit container number into the 32-bit uid.  I guess we don't need to
explicitly worry about clashes between our usage and those ?

> +# Limitations
> +
> +The following features still need to be implemented:
> + * Inserting a new cdrom while the guest is running (xl cdrom-insert)
> + * Migration / save / restore
> +
> +Additionally, getting PCI passthrough to work securely would require a
> +significant rework of how passthrough works at the moment.  It may be
> +implemented at some point but is not a near-term priority.

The limitations section should also say something like this:

 The currently implemented restrictions are thought to be a useful
 security improvement.  However, the design and implementation is
 preliminary and there is work left to do.  Accordingly we do not
 promise that they are sufficient to stop a rogue domain which takes
 control of its qemu from escaping into the host, let alone stop it
 from denying service to the host.

 Therefore, bugs which affect the effectiveness of the qemu depriv
 mechanisms will be treated as plain bugs, not security bugs; they
 would not result in a Xen Project Security Advisory.  However, bugs
 where the security of a system with dm_restrict=1 is worse than
 before, will be treated as security bugs.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] docs/qemu-deprivilege: Revise and update with status and future plans

2018-09-24 Thread Paul Durrant
> -Original Message-
[snip]
> diff --git a/docs/features/qemu-deprivilege.pandoc b/docs/features/qemu-
> deprivilege.pandoc
> new file mode 100644
> index 00..b377076606
> --- /dev/null
> +++ b/docs/features/qemu-deprivilege.pandoc
> @@ -0,0 +1,91 @@
> +% QEMU Deprivileging / dm_restrict
> +% Revision 1
> +
> +\clearpage
> +
> +# Basics
> +
> + 
> + Status: **Tech Preview**
> +
> +Architecture(s): x86
> +
> +   Component(s): toolstack
> +
> + 
> +
> +# Overview
> +
> +By default, the QEMU device model is run in domain 0.  If an attacker
> +can gain control of a QEMU process, it could easily take control of a
> +system.
> +
> +dm_restrict is a set of operations to restrict QEMU running in domain
> +0.  It consists of two halves:
> +
> + 1. Mechanisms to restrict QEMU to only being able to affect its own
> +domain
> + 2. Mechanisms to restruct QEMU's ability to interact with domain 0.
> +
> +# User details
> +
> +## Getting the right versions of software
> +
> +Linux: 4.11+
> +
> +Qemu: 3.0+ (Or the version that comes with Xen 4.12+)
> +
> +## Setting up a userid range
> +
> +For maximum security, libxl needs to run the devicemodel for each
> +domain under a user id (UID) corresponding to its domain id.  There
> +are 32752 possible domain IDs, and so libxl needs 32752 user ids set
> +aside for it.
> +
> +The simplest and most effective way to do this is to allocate a
> +contiguous block of UIDs, and create a single user named
> +`xen-qemuuser-range-base` with the first UID.  For example, under Debian:
> +
> +adduser --no-create-home --uid 65536 --system xen-qemuuser-range-base
> +
> +NOTE: Most modern systems have 32-bit UIDs, and so can in theory go up
> +to 2^31 (or 2^32 if uids are unsigned).  POSIX only guarantees 16-bit
> +UIDs however.  UID 65535 is reserved for an invalid value, and 65534
> +is normally allocated to "nobody".
> +
> +Another, less-secure way is to run all QEMUs as the same UID.  To do
> +this, create a user named `xen-qemuuser-shared`; for example:
> +
> +adduser --no-create-home --system xen-qemuuser-shared
> +
> +## Domain config changes
> +
> +The core domain config change is to add the following line to the
> +domain configuration:
> +
> +dm_restrict=1
> +
> +This will perform a number of restrictions, outlined below in the
> +'Technical details' section.
> +
> +# Technical details
> +
> +See docs/design/qemu-deprivilege.txt for technical details.

Nit... I guess you mean docs/design/qemu-deprivilege.md?

  Paul

> +
> +# Limitations
> +
> +The following features still need to be implemented:
> + * Inserting a new cdrom while the guest is running (xl cdrom-insert)
> + * Migration / save / restore
> +
> +Additionally, getting PCI passthrough to work securely would require a
> +significant rework of how passthrough works at the moment.  It may be
> +implemented at some point but is not a near-term priority.
> +
> +# History
> +
> +
> +Date   Revision Version  Notes
> +--   ---
> +2018-09-14 1Xen 4.12 Imported from docs/misc
> +--   ---
> diff --git a/docs/misc/qemu-deprivilege.txt b/docs/misc/qemu-
> deprivilege.txt
> deleted file mode 100644
> index 58b86a3908..00
> --- a/docs/misc/qemu-deprivilege.txt
> +++ /dev/null
> @@ -1,36 +0,0 @@
> -For security reasons, libxl tries to pass a non-root username to QEMU as
> -argument. During initialization QEMU calls setuid and setgid with the
> -user ID and the group ID of the user passed as argument.
> -Libxl looks for the following users in this order:
> -
> -1) a user named "xen-qemuuser-domid$domid",
> -Where $domid is the domid of the domain being created.
> -This requires the reservation of 65535 uids from xen-qemuuser-domid1
> -to xen-qemuuser-domid65535. To use this mechanism, you might want to
> -create a large number of users at installation time. For example:
> -
> -for ((i=1; i<65536; i++))
> -do
> -adduser --no-create-home --system xen-qemuuser-domid$i
> -done
> -
> -You might want to consider passing --group to adduser to create a new
> -group for each new user.
> -
> -
> -2) a user named "xen-qemuuser-shared"
> -As a fall back if both 1) fails, libxl will use a single user for
> -all QEMU instances. The user is named xen-qemuuser-shared. This is
> -less secure but still better than running QEMU as root. Using this is as
> -simple as creating just one more user on your host:
> -
> -adduser --no-create-home --system xen-qemuuser-shared
> -
> -
> -3) root
> -As a last resort, libxl will start QEMU as root.
> -
> -
> -Please note that running QEMU as non-root causes several features like
> -migration and PCI passthrough to not work properly