Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-06-12 Thread Elena Ufimtseva
On Wed, Jun 12, 2019 at 05:24:13PM +0100, Stefan Hajnoczi wrote:
> On Thu, May 30, 2019 at 01:54:35PM -0700, Elena Ufimtseva wrote:
> > On Tue, May 28, 2019 at 08:18:20AM -0700, Elena Ufimtseva wrote:
> > > On Thu, May 23, 2019 at 12:11:30PM +0100, Stefan Hajnoczi wrote:
> > > > Hi Jag and Elena,
> > > > Do you think a call would help to move discussion along more quickly?
> > > >
> > > 
> > > Hi Stefan,
> > > 
> > > We would like to join this call.
> > > And thank you inviting us!
> > > 
> > > Elena
> > > > We could use the next KVM Community Call on June 4th to discuss
> > > > remaining concerns and the next steps:
> > > > https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
> > > >
> > > > I also hope to include other core QEMU developers.  As you know, I'm
> > > > skeptical, but it could be just me and I don't want to block you
> > > > unnecessarily if others are more enthusiastic about this approach.
> > > >
> > 
> > Hi Stefan
> > 
> > A few questions we have are about the call.
> > What is the format of the call usually? Should we provide some kind of the 
> > project outline for 5 minutes?
> > We are planning to address some of the concerns you have voiced in regards 
> > to amount of changes, usability,
> > security and performance. I assume there will be other questions as well. 
> > Is there any time limit per topic?
> > 
> > And would you mind sharing the call details with us?
> 
> Hi Elena and Jag,

Hi Stefan,

> Sorry, I was away on sick leave. 

Ah, sorry about that - we have guessed that you were away, but thought
people were mostly on vacation.

> The KVM Community Call is informal.
> The goal is to get people together in a teleconference where we can
> discuss topics much more quickly than on the mailing list.  This can
> help make progress in areas where the mailing list discussion seems to
> be making slow progress.
> 
> I would suggest starting with a status update the describes your
> current approach (without assuming the audience has familiarity).  Then
> you could touch on any issues where you'd like input from the community
> and you could take questions.
> 
> Our goal should be to get a consensus on whether disaggregated QEMU can
> be merged or not.
>

Thanks!
> Here are the calendar details (Tuesday, June 18th at 8:00 UTC):
> https://calendar.google.com/calendar/ical/tob1tjqp37v8evp74h0q8kpjqs%40group.calendar.google.com/public/basic.ics
> 
> Is this time okay for you?

Yes, this time is fine.
Do you have dial-in info for us?

Thank you!

Elena, Jag and JJ
> 
> Stefan





Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-06-12 Thread Stefan Hajnoczi
On Thu, May 30, 2019 at 01:54:35PM -0700, Elena Ufimtseva wrote:
> On Tue, May 28, 2019 at 08:18:20AM -0700, Elena Ufimtseva wrote:
> > On Thu, May 23, 2019 at 12:11:30PM +0100, Stefan Hajnoczi wrote:
> > > Hi Jag and Elena,
> > > Do you think a call would help to move discussion along more quickly?
> > >
> > 
> > Hi Stefan,
> > 
> > We would like to join this call.
> > And thank you inviting us!
> > 
> > Elena
> > > We could use the next KVM Community Call on June 4th to discuss
> > > remaining concerns and the next steps:
> > > https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
> > >
> > > I also hope to include other core QEMU developers.  As you know, I'm
> > > skeptical, but it could be just me and I don't want to block you
> > > unnecessarily if others are more enthusiastic about this approach.
> > >
> 
> Hi Stefan
> 
> A few questions we have are about the call.
> What is the format of the call usually? Should we provide some kind of the 
> project outline for 5 minutes?
> We are planning to address some of the concerns you have voiced in regards to 
> amount of changes, usability,
> security and performance. I assume there will be other questions as well. Is 
> there any time limit per topic?
> 
> And would you mind sharing the call details with us?

Hi Elena and Jag,
Sorry, I was away on sick leave.  The KVM Community Call is informal.
The goal is to get people together in a teleconference where we can
discuss topics much more quickly than on the mailing list.  This can
help make progress in areas where the mailing list discussion seems to
be making slow progress.

I would suggest starting with a status update the describes your
current approach (without assuming the audience has familiarity).  Then
you could touch on any issues where you'd like input from the community
and you could take questions.

Our goal should be to get a consensus on whether disaggregated QEMU can
be merged or not.

Here are the calendar details (Tuesday, June 18th at 8:00 UTC):
https://calendar.google.com/calendar/ical/tob1tjqp37v8evp74h0q8kpjqs%40group.calendar.google.com/public/basic.ics

Is this time okay for you?

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-06-11 Thread Jag Raman




On 5/30/2019 4:54 PM, Elena Ufimtseva wrote:

On Tue, May 28, 2019 at 08:18:20AM -0700, Elena Ufimtseva wrote:

On Thu, May 23, 2019 at 12:11:30PM +0100, Stefan Hajnoczi wrote:

Hi Jag and Elena,
Do you think a call would help to move discussion along more quickly?



Hi Stefan,

We would like to join this call.
And thank you inviting us!

Elena

We could use the next KVM Community Call on June 4th to discuss
remaining concerns and the next steps:
https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

I also hope to include other core QEMU developers.  As you know, I'm
skeptical, but it could be just me and I don't want to block you
unnecessarily if others are more enthusiastic about this approach.



Hi Stefan

A few questions we have are about the call.
What is the format of the call usually? Should we provide some kind of the 
project outline for 5 minutes?
We are planning to address some of the concerns you have voiced in regards to 
amount of changes, usability,
security and performance. I assume there will be other questions as well. Is 
there any time limit per topic?

And would you mind sharing the call details with us?

Thanks!
Elena




Stefan


Hi Stefan,

We would like to add multi-process QEMU to the agenda for any of the
upcoming KVM community calls. Do you know how we could go about doing
this?

Could you kindly share the contact details of the organizer for this
meeting?

Thank you very much!
--
Jag








Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-06-11 Thread Jag Raman




On 5/23/2019 6:40 AM, Stefan Hajnoczi wrote:

On Tue, May 07, 2019 at 03:00:52PM -0400, Jag Raman wrote:

Hi Stefan,

Thank you very much for your feedback. Following is a summary of the
discussions our team had regarding your feedback.

On 4/25/2019 11:44 AM, Stefan Hajnoczi wrote:


Can multiple LSI SCSI controllers be launched such that each process
only has access to a subset of disk images?  Or is the disk image label
per-VM so that there is no isolation between LSI SCSI controller
processes for that VM?


Yes, it is possible to provide each process with access to a subset of
disk images. The Orchestrator (libvirt, etc.) assigns a set of MCS
Categories to each VM, then device instances can be isolated by being
assigned a subset of the VM’s Categories.



My concern with this overall approach is the practicality vs its
benefits.  Regarding practicality, each emulated device needs to be
proxied separately.  The QEMU subsystem used by the device also needs to
be proxied.  Global state, monitor commands, and live migration all
require code changes to support proxied operation.  This is very
invasive.

Then each emulated device needs an SELinux policy to achieve the
benefits of confinement.  I have no idea how to correctly write a policy
like this and it's likely that developers who contribute a single new
device will not be proficient in it either.  Writing these policies is a
rare thing and few people will be good at this.  It also makes me worry
about how we test and review them.


We also think that having an SELinux policy per device would become
complicated. Our proposal, therefore, is to define SELinux policies for
each device class - viz. disk, network, console, graphics, etc.
"fedora-selinux" upstream repo. [1] will contain these policies, so the
device developer doesn't have to worry about defining new policies for
each device. This proposal would diminish the complexity of SELinux
policies.


Have you considered using Linux namespaces?  I'm beginning to think that
SELinux becomes less relevant with pid and mount namespaces to isolate
processes.  The advantage of namespaces is that they are easy to
understand and can be expressed in code instead of a policy file in a
separate package.  This is the approach we're taking with virtiofsd
(vhost-user device backend for virtio-fs).



Despite the efforts required in making this work, all processes still
effectively have full access to the guest since they can access guest
RAM.  What I mean is that the device is actually not confined to its
host process (e.g. LSI SCSI controller process) because it can write
code to executable guest RAM pages.  The guest will then execute that
code and therefore all guest I/O (networking, disk, etc) is still
available indirectly to the "confined" processes.  They are not really
sandboxed from the outside world, regardless of how strict the SELinux
policy is :(.

There are performance issues due to proxying as well, but let's ignore
them for now and focus on security.


We are also focusing on performance. Please take a look at the following
blog for an initial report on performance. The results are for an iSCSI
backend in Oracle Cloud. We are working on collecting data on a much
heavier IOPS workload like an NVMe backend.

https://blogs.oracle.com/linux/towards-a-more-secure-qemu-hypervisor%2c-part-3-of-3-v2


Hard to reach a conclusion without also looking at CPU utilization.
IOPS alone don't tell the story.

If the system had spare CPU cycles then the performance results between
built-in LSI and separate LSI will be similar but the efficiency
(IOPS/CPU%) has actually decreased due to the extra CPU cycles required
to forward the hardware register access to the device emulation process.

If you rerun on a system without spare CPU cycles then IOPS degradation
would become apparent.  I'm not saying this is necessarily the case,
maybe the overhead is really doesn't have a significant effect, but the
graph shown in the blog post isn't enough to draw a conclusion either
way.


Hi Stefan,

We are working on getting a better idea about the CPU utilization while 
the performance test is running. We're looking forward to discussing 
this during the forthcoming KVM meeting.


Thank you!
--
Jag



Regarding the proposed QEMU bypass, these already exist in some form via
kvm.ko's ioeventfd and coalesced MMIO features.

Today ioeventfd is only used for performance-critical hardware
registers, so kvm.ko doesn't use a sophisticated dispatch mechanism.  If
you want to use it for all hardware register accesses handled by a
separate process then ioeventfd probably needs to be tweaked somewhat to
make it more scalable for that case.

Coalesced MMIO is also cool.  kvm.ko can accumulate guest MMIO writes in
a buffer that is only collected at a later point in time.  This improves
performance for devices that require multiple hardware register writes
to kick off an I/O operation (only the last one really needs to be
trapped by the device 

Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-05-30 Thread Elena Ufimtseva
On Tue, May 28, 2019 at 08:18:20AM -0700, Elena Ufimtseva wrote:
> On Thu, May 23, 2019 at 12:11:30PM +0100, Stefan Hajnoczi wrote:
> > Hi Jag and Elena,
> > Do you think a call would help to move discussion along more quickly?
> >
> 
> Hi Stefan,
> 
> We would like to join this call.
> And thank you inviting us!
> 
> Elena
> > We could use the next KVM Community Call on June 4th to discuss
> > remaining concerns and the next steps:
> > https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
> >
> > I also hope to include other core QEMU developers.  As you know, I'm
> > skeptical, but it could be just me and I don't want to block you
> > unnecessarily if others are more enthusiastic about this approach.
> >

Hi Stefan

A few questions we have are about the call.
What is the format of the call usually? Should we provide some kind of the 
project outline for 5 minutes?
We are planning to address some of the concerns you have voiced in regards to 
amount of changes, usability,
security and performance. I assume there will be other questions as well. Is 
there any time limit per topic?

And would you mind sharing the call details with us?

Thanks!
Elena
> 
> 
> > Stefan
> 
> 



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-05-28 Thread Elena Ufimtseva
On Thu, May 23, 2019 at 12:11:30PM +0100, Stefan Hajnoczi wrote:
> Hi Jag and Elena,
> Do you think a call would help to move discussion along more quickly?
>

Hi Stefan,

We would like to join this call.
And thank you inviting us!

Elena
> We could use the next KVM Community Call on June 4th to discuss
> remaining concerns and the next steps:
> https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
>
> I also hope to include other core QEMU developers.  As you know, I'm
> skeptical, but it could be just me and I don't want to block you
> unnecessarily if others are more enthusiastic about this approach.
>


> Stefan





Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-05-23 Thread Stefan Hajnoczi
On Tue, May 07, 2019 at 02:00:59PM -0700, Elena Ufimtseva wrote:
> On Mon, Mar 11, 2019 at 10:20:06AM +, Daniel P. Berrangé wrote:
> > On Thu, Mar 07, 2019 at 03:29:41PM -0800, John G Johnson wrote:
> > > 
> > > 
> 
> Hi Daniel, Stefan
> 
> We have not replied in a while as we were trying to figure out
> the best approach after multiple comments we have received on the
> patch series.
> 
> Leaving other concerns that you, Stefan and others shared with us
> out of this particular topic, we would like to get your opinion on
> the following approach.
> 
> Please see below.
> 
> > > > On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  
> > > > wrote:
> > > > 
> > > > On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> > > >> I guess one obvious answer is that the existing security mechanisms 
> > > >> like
> > > >> SELinux/ApArmor/DAC can be made to work in a more fine grained manner 
> > > >> if
> > > >> there are distinct processes. This would allow for a more useful 
> > > >> seccomp
> > > >> filter to better protect against secondary kernel exploits should QEMU
> > > >> itself be exploited, if we can protect individual components.
> > > > 
> > > > Fine-grained sandboxing is possible in theory but tedious in practice.
> > > > From what I can tell this patch series doesn't implement any sandboxing
> > > > for child processes.
> > > > 
> > > 
> > >   The policies aren’t in QEMU, but in the selinux config files.
> > > They would say, for example, that when the QEMU process exec()s the
> > > disk emulation process, the process security context type transitions
> > > to a new type.  This type would have permission to access the VM image
> > > objects, whereas the QEMU process type (and any other device emulation
> > > process types) cannot access them.
> > 
> > Note that currently all QEMU instances run by libvirt have seccomp
> > policy applied that explicitly forbids any use of fork+exec as a way
> > to reduce avenues of attack for an exploited QEMU.
> > 
> > Even in a modularized QEMU I'd be loathe to allow QEMU to have the
> > fork+exec privileged, unless "QEMU" in this case was just a stub
> > process that does nothing more than fork+exec the other binaries,
> > while having zero attack exposed to the untrusted guest OS.
> 
> We see libvirt uses QEMU’s -sandbox option to indicate that QEMU
> should use seccomp() to prohibit future use of certain system calls,
> including fork() and exec().  Our idea is to enumerate the remote
> processes needed via QEMU command line options, and have QEMU exec()
> those processes before -sandbox is processed.
> And we also will init seccomp for emulated devices processes.

Sounds good.

My experience with seccomp is that whitelisting syscalls is fragile
because of library dependencies.  Even glibc might invoke syscalls you
didn't expect, especially after a kernel/glibc upgrade, forcing you to
modify the whitelist.

However, once a whitelist is successfully in place it's a simple way to
reduce the syscall attack surface and I think it's worthwhile.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-05-23 Thread Stefan Hajnoczi
Hi Jag and Elena,
Do you think a call would help to move discussion along more quickly?

We could use the next KVM Community Call on June 4th to discuss
remaining concerns and the next steps:
https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

I also hope to include other core QEMU developers.  As you know, I'm
skeptical, but it could be just me and I don't want to block you
unnecessarily if others are more enthusiastic about this approach.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-05-23 Thread Stefan Hajnoczi
On Tue, May 07, 2019 at 03:00:52PM -0400, Jag Raman wrote:
> Hi Stefan,
> 
> Thank you very much for your feedback. Following is a summary of the
> discussions our team had regarding your feedback.
> 
> On 4/25/2019 11:44 AM, Stefan Hajnoczi wrote:
> > 
> > Can multiple LSI SCSI controllers be launched such that each process
> > only has access to a subset of disk images?  Or is the disk image label
> > per-VM so that there is no isolation between LSI SCSI controller
> > processes for that VM?
> 
> Yes, it is possible to provide each process with access to a subset of
> disk images. The Orchestrator (libvirt, etc.) assigns a set of MCS
> Categories to each VM, then device instances can be isolated by being
> assigned a subset of the VM’s Categories.
> 
> > 
> > My concern with this overall approach is the practicality vs its
> > benefits.  Regarding practicality, each emulated device needs to be
> > proxied separately.  The QEMU subsystem used by the device also needs to
> > be proxied.  Global state, monitor commands, and live migration all
> > require code changes to support proxied operation.  This is very
> > invasive.
> > 
> > Then each emulated device needs an SELinux policy to achieve the
> > benefits of confinement.  I have no idea how to correctly write a policy
> > like this and it's likely that developers who contribute a single new
> > device will not be proficient in it either.  Writing these policies is a
> > rare thing and few people will be good at this.  It also makes me worry
> > about how we test and review them.
> 
> We also think that having an SELinux policy per device would become
> complicated. Our proposal, therefore, is to define SELinux policies for
> each device class - viz. disk, network, console, graphics, etc.
> "fedora-selinux" upstream repo. [1] will contain these policies, so the
> device developer doesn't have to worry about defining new policies for
> each device. This proposal would diminish the complexity of SELinux
> policies.

Have you considered using Linux namespaces?  I'm beginning to think that
SELinux becomes less relevant with pid and mount namespaces to isolate
processes.  The advantage of namespaces is that they are easy to
understand and can be expressed in code instead of a policy file in a
separate package.  This is the approach we're taking with virtiofsd
(vhost-user device backend for virtio-fs).

> > 
> > Despite the efforts required in making this work, all processes still
> > effectively have full access to the guest since they can access guest
> > RAM.  What I mean is that the device is actually not confined to its
> > host process (e.g. LSI SCSI controller process) because it can write
> > code to executable guest RAM pages.  The guest will then execute that
> > code and therefore all guest I/O (networking, disk, etc) is still
> > available indirectly to the "confined" processes.  They are not really
> > sandboxed from the outside world, regardless of how strict the SELinux
> > policy is :(.
> > 
> > There are performance issues due to proxying as well, but let's ignore
> > them for now and focus on security.
> 
> We are also focusing on performance. Please take a look at the following
> blog for an initial report on performance. The results are for an iSCSI
> backend in Oracle Cloud. We are working on collecting data on a much
> heavier IOPS workload like an NVMe backend.
> 
> https://blogs.oracle.com/linux/towards-a-more-secure-qemu-hypervisor%2c-part-3-of-3-v2

Hard to reach a conclusion without also looking at CPU utilization.
IOPS alone don't tell the story.

If the system had spare CPU cycles then the performance results between
built-in LSI and separate LSI will be similar but the efficiency
(IOPS/CPU%) has actually decreased due to the extra CPU cycles required
to forward the hardware register access to the device emulation process.

If you rerun on a system without spare CPU cycles then IOPS degradation
would become apparent.  I'm not saying this is necessarily the case,
maybe the overhead is really doesn't have a significant effect, but the
graph shown in the blog post isn't enough to draw a conclusion either
way.

Regarding the proposed QEMU bypass, these already exist in some form via
kvm.ko's ioeventfd and coalesced MMIO features.

Today ioeventfd is only used for performance-critical hardware
registers, so kvm.ko doesn't use a sophisticated dispatch mechanism.  If
you want to use it for all hardware register accesses handled by a
separate process then ioeventfd probably needs to be tweaked somewhat to
make it more scalable for that case.

Coalesced MMIO is also cool.  kvm.ko can accumulate guest MMIO writes in
a buffer that is only collected at a later point in time.  This improves
performance for devices that require multiple hardware register writes
to kick off an I/O operation (only the last one really needs to be
trapped by the device emulation code!).  This sounds similar to an MMIO
access shared ring buffer.

> > 
> > How do the 

Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-05-07 Thread Elena Ufimtseva
On Mon, Mar 11, 2019 at 10:20:06AM +, Daniel P. Berrangé wrote:
> On Thu, Mar 07, 2019 at 03:29:41PM -0800, John G Johnson wrote:
> > 
> > 

Hi Daniel, Stefan

We have not replied in a while as we were trying to figure out
the best approach after multiple comments we have received on the
patch series.

Leaving other concerns that you, Stefan and others shared with us
out of this particular topic, we would like to get your opinion on
the following approach.

Please see below.

> > > On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  wrote:
> > > 
> > > On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> > >> I guess one obvious answer is that the existing security mechanisms like
> > >> SELinux/ApArmor/DAC can be made to work in a more fine grained manner if
> > >> there are distinct processes. This would allow for a more useful seccomp
> > >> filter to better protect against secondary kernel exploits should QEMU
> > >> itself be exploited, if we can protect individual components.
> > > 
> > > Fine-grained sandboxing is possible in theory but tedious in practice.
> > > From what I can tell this patch series doesn't implement any sandboxing
> > > for child processes.
> > > 
> > 
> > The policies aren’t in QEMU, but in the selinux config files.
> > They would say, for example, that when the QEMU process exec()s the
> > disk emulation process, the process security context type transitions
> > to a new type.  This type would have permission to access the VM image
> > objects, whereas the QEMU process type (and any other device emulation
> > process types) cannot access them.
> 
> Note that currently all QEMU instances run by libvirt have seccomp
> policy applied that explicitly forbids any use of fork+exec as a way
> to reduce avenues of attack for an exploited QEMU.
> 
> Even in a modularized QEMU I'd be loathe to allow QEMU to have the
> fork+exec privileged, unless "QEMU" in this case was just a stub
> process that does nothing more than fork+exec the other binaries,
> while having zero attack exposed to the untrusted guest OS.

We see libvirt uses QEMU’s -sandbox option to indicate that QEMU
should use seccomp() to prohibit future use of certain system calls,
including fork() and exec().  Our idea is to enumerate the remote
processes needed via QEMU command line options, and have QEMU exec()
those processes before -sandbox is processed.
And we also will init seccomp for emulated devices processes.

> 
> > If you wanted to use DAC, you could do the something similar by
> > making the disk emulation executable setuid to a UID than can access
> > VM image files.
> > 
> > In either case, the policies and permissions are set up before
> > libvirt even runs, so it doesn’t need to be aware of them.
> 
> That's not the case bearing in mind the above point about fork+exec
> being forbidden. It would likely require libvirt to be in charge of
> spawning the various helper binaries from a trusted context.
> 
> 
> > > How to do this in practice must be clear from the beginning if
> > > fine-grained sandboxing is the main selling point.
> > > 
> > > Some details to start the discussion:
> > > 
> > > * How will fine-grained SELinux/AppArmor/DAC policies be configured for
> > >   each process?  I guess this requires root, so does libvirt need to
> > >   know about each process?
> > > 
> > 
> > The polices would apply to process security context types (or
> > UIDs in a DAC regime), so I would not expect libvirt to be aware of them.
> 
> I'm pretty skeptical that such a large modularization of QEMU can be
> done without libvirt being aware of it & needing some kind of changes
> applied.
>

We agree with that. With above proposed approach we still have to change hotplug
in some way.
If a eparate process will be spawned, libvirt will be the one doing
fork/exec of the separate processes. Or possibly launch a helper
binaries that will unify the way how an instance is being started with
multiple processes and hotplugging.

Thanks!
Elena, Jag, John.


> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-05-07 Thread Jag Raman

Hi Stefan,

Thank you very much for your feedback. Following is a summary of the
discussions our team had regarding your feedback.

On 4/25/2019 11:44 AM, Stefan Hajnoczi wrote:


Can multiple LSI SCSI controllers be launched such that each process
only has access to a subset of disk images?  Or is the disk image label
per-VM so that there is no isolation between LSI SCSI controller
processes for that VM?


Yes, it is possible to provide each process with access to a subset of
disk images. The Orchestrator (libvirt, etc.) assigns a set of MCS
Categories to each VM, then device instances can be isolated by being
assigned a subset of the VM’s Categories.



My concern with this overall approach is the practicality vs its
benefits.  Regarding practicality, each emulated device needs to be
proxied separately.  The QEMU subsystem used by the device also needs to
be proxied.  Global state, monitor commands, and live migration all
require code changes to support proxied operation.  This is very
invasive.

Then each emulated device needs an SELinux policy to achieve the
benefits of confinement.  I have no idea how to correctly write a policy
like this and it's likely that developers who contribute a single new
device will not be proficient in it either.  Writing these policies is a
rare thing and few people will be good at this.  It also makes me worry
about how we test and review them.


We also think that having an SELinux policy per device would become
complicated. Our proposal, therefore, is to define SELinux policies for
each device class - viz. disk, network, console, graphics, etc.
"fedora-selinux" upstream repo. [1] will contain these policies, so the
device developer doesn't have to worry about defining new policies for
each device. This proposal would diminish the complexity of SELinux
policies.



Despite the efforts required in making this work, all processes still
effectively have full access to the guest since they can access guest
RAM.  What I mean is that the device is actually not confined to its
host process (e.g. LSI SCSI controller process) because it can write
code to executable guest RAM pages.  The guest will then execute that
code and therefore all guest I/O (networking, disk, etc) is still
available indirectly to the "confined" processes.  They are not really
sandboxed from the outside world, regardless of how strict the SELinux
policy is :(.

There are performance issues due to proxying as well, but let's ignore
them for now and focus on security.


We are also focusing on performance. Please take a look at the following
blog for an initial report on performance. The results are for an iSCSI
backend in Oracle Cloud. We are working on collecting data on a much
heavier IOPS workload like an NVMe backend.

https://blogs.oracle.com/linux/towards-a-more-secure-qemu-hypervisor%2c-part-3-of-3-v2



How do the benefits compare against today's monolithic approach?  If the
guest exploits monolithic QEMU it has full access to all host files and
APIs available to QEMU.  However, these are largely just the resources
that belong to the guest anyway - not resources we are trying to keep
away from the guest.  With multi-process QEMU each process still has
access to all guest interfaces via the code injection I mentioned above,
but the SELinux policy could restrict access to some resources.  But
this benefit is really small in my opinion, given that the resources
belong to the guest anyway and the guest can already access them.


The primary focus of our project is to defend the host from malicious
guest. The code injection problem you outlined above involves part of
the guest attacking itself, but not the host. Therefore, this wouldn't
compromise our objective.

Like you know, there are some parts of QEMU which are not directly
accessible from the guest (via drivers, etc.), which we prefer to call
the control plane. It executes ioctls to the host kernel and has access
to a broader set of syscalls, which the device emulation code doesn’t
need. We want to protect the control plane from emulated devices. In the
case where a device injects code into the RAM to attack another device
on the same VM, the control plane would still be protected.

Another benefit with the project would be regarding detecting and
reporting failures in the emulated devices. For instance, in cases like
CVE-2018-18849, where an emulated device hangs/crashes, it wouldn't
directly crash the QEMU process as well. QEMU could detect the failure,
log the problem and exit, instead of generating coredump/hang.



I think you can implement this for a handful of devices as a one-time
thing, but the invasiveness and the impracticality of getting wide cover
of QEMU make this approach questionable.

Am I mistaken about the invasiveness or impracticality?


We are not planning to implement this for all devices since it would be
impractical. But the project adds a framework for implementing more
devices in the future.

One other thing we would 

Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-04-25 Thread Stefan Hajnoczi
On Tue, Apr 23, 2019 at 05:26:33PM -0400, Jag Raman wrote:
> On 3/26/2019 6:20 PM, Philippe Mathieu-Daudé wrote:
> 
> > > > > Please share the SELinux policy files, containerization scripts, etc.
> > > > > There is probably a home for them in qemu.git, libvirt.git, or 
> > > > > elsewhere
> > > > > upstream.
> > > > > 
> > > > > We need to find a way to make the sandboxing improvements available to
> > > > > users besides yourself and easily reusable for developers who wish to
> > > > > convert additional device models.
> > > 
> > 
> > Also for testing this series.
> 
> Hi,
> 
> We are wondering how to deliver the example SELinux policies. I have
> posted on Fedora's SELinux mailing list to get info. on how to upstream
> SElinux policy.
> 
> We are developing SELinux Type Enforcements and MCS labels to sandbox
> the emulation process. Details regarding example Type Enforcement is
> available below.
> 
> We are also working on changes to libvirt, to launch the remote process
> and apply MCS labels. Libvirt changes will be posted separately in the
> future.
> 
> The Type Enforcements for SElinux is available in the pastebin location
> below (also copied at the end of this email):
> https://pastebin.com/t1bpS6MY

Can multiple LSI SCSI controllers be launched such that each process
only has access to a subset of disk images?  Or is the disk image label
per-VM so that there is no isolation between LSI SCSI controller
processes for that VM?

My concern with this overall approach is the practicality vs its
benefits.  Regarding practicality, each emulated device needs to be
proxied separately.  The QEMU subsystem used by the device also needs to
be proxied.  Global state, monitor commands, and live migration all
require code changes to support proxied operation.  This is very
invasive.

Then each emulated device needs an SELinux policy to achieve the
benefits of confinement.  I have no idea how to correctly write a policy
like this and it's likely that developers who contribute a single new
device will not be proficient in it either.  Writing these policies is a
rare thing and few people will be good at this.  It also makes me worry
about how we test and review them.

Despite the efforts required in making this work, all processes still
effectively have full access to the guest since they can access guest
RAM.  What I mean is that the device is actually not confined to its
host process (e.g. LSI SCSI controller process) because it can write
code to executable guest RAM pages.  The guest will then execute that
code and therefore all guest I/O (networking, disk, etc) is still
available indirectly to the "confined" processes.  They are not really
sandboxed from the outside world, regardless of how strict the SELinux
policy is :(.

There are performance issues due to proxying as well, but let's ignore
them for now and focus on security.

How do the benefits compare against today's monolithic approach?  If the
guest exploits monolithic QEMU it has full access to all host files and
APIs available to QEMU.  However, these are largely just the resources
that belong to the guest anyway - not resources we are trying to keep
away from the guest.  With multi-process QEMU each process still has
access to all guest interfaces via the code injection I mentioned above,
but the SELinux policy could restrict access to some resources.  But
this benefit is really small in my opinion, given that the resources
belong to the guest anyway and the guest can already access them.

I think you can implement this for a handful of devices as a one-time
thing, but the invasiveness and the impracticality of getting wide cover
of QEMU make this approach questionable.

Am I mistaken about the invasiveness or impracticality?

Am I misunderstanding the security benefits compared to what already
exists today?

A more practical approach is to strip down QEMU (compiling out unused
devices and features) and to run virtio devices in vhost-user processes
(e.g. virtio-input, virtio-gpu, virtio-fs).  This achieves similar goals
without proxy objects or invasive changes to QEMU since the vhost-user
devices use a different codebase and aren't accessible via the QEMU
monitor.  The limitation is that existing QEMU code and non-virtio
devices aren't available in this model.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-04-23 Thread Jag Raman




On 3/26/2019 6:20 PM, Philippe Mathieu-Daudé wrote:


Please share the SELinux policy files, containerization scripts, etc.
There is probably a home for them in qemu.git, libvirt.git, or elsewhere
upstream.

We need to find a way to make the sandboxing improvements available to
users besides yourself and easily reusable for developers who wish to
convert additional device models.




Also for testing this series.


Hi,

We are wondering how to deliver the example SELinux policies. I have
posted on Fedora's SELinux mailing list to get info. on how to upstream
SElinux policy.

We are developing SELinux Type Enforcements and MCS labels to sandbox
the emulation process. Details regarding example Type Enforcement is
available below.

We are also working on changes to libvirt, to launch the remote process
and apply MCS labels. Libvirt changes will be posted separately in the
future.

The Type Enforcements for SElinux is available in the pastebin location
below (also copied at the end of this email):
https://pastebin.com/t1bpS6MY

An RPM package which installs this policy as a SELinux module, and
configures the file contexts for the executables, is available for
download in the link below:
http://wikisend.com/download/156700/mpqemu-selinux-example-1.0-1.fc29.noarch.rpm

The README for RPM could be obtained by running the following commands:
# rpm2cpio ./packagecloud-test-1.1-1.x86_64.rpm | cpio -idmv
# cat opt/mpqemu-selinux-example/doc/README

Thanks!
--
Jag


--
mpqemu.te:
--

module mpqemu 1.0;


require {
class process transition;
class file { execute read };
class file entrypoint;
class dir search;
class file { getattr open read };
class file { getattr map open read };
class file { execute map read };
class lnk_file read;
class chr_file { lock open read write };
class file { getattr ioctl lock open read write };
class process fork;
class fd use;
class unix_stream_socket { read write };
class file open;
class process { noatsecure rlimitinh siginh };
class file write;
class dir { getattr search };
class file { open read };
class process getattr;
type qemu_t;
type qemu_exec_t;
type virtd_t;
type ld_so_cache_t;
type ld_so_t;
type lib_t;
type null_device_t;
type virt_image_t;
type shell_exec_t;
type init_t;
attribute domain;
attribute entry_type;
attribute exec_type;
attribute application_exec_type;
attribute file_type, non_security_file_type, non_auth_file_type;
attribute virt_domain;
attribute virt_image_type;

};


type qemu_lsi53c895a_exec_t;
type qemu_lsi53c895a_img_t;
type qemu_lsi53c895a_t;

typeattribute qemu_lsi53c895a_t virt_domain;

typeattribute qemu_lsi53c895a_exec_t file_type, non_security_file_type, 
non_auth_file_type;

typeattribute qemu_lsi53c895a_exec_t exec_type;
typeattribute qemu_lsi53c895a_exec_t application_exec_type;
typeattribute qemu_lsi53c895a_exec_t entry_type;
typeattribute qemu_lsi53c895a_img_t file_type, non_security_file_type, 
non_auth_file_type;

typeattribute qemu_lsi53c895a_img_t virt_image_type;
type_transition qemu_t qemu_lsi53c895a_exec_t : process qemu_lsi53c895a_t;
type_transition virtd_t qemu_exec_t : process qemu_t;

#= init_t ==
allow init_t qemu_lsi53c895a_t:dir search;
allow init_t qemu_lsi53c895a_t:file { getattr open read };

#= qemu_lsi53c895a_t ==
allow qemu_lsi53c895a_t ld_so_cache_t : file { getattr map open read };
allow qemu_lsi53c895a_t ld_so_t : file { execute map read };
allow qemu_lsi53c895a_t lib_t : lnk_file read;
allow qemu_lsi53c895a_t null_device_t : chr_file { lock open read write };
allow qemu_lsi53c895a_t qemu_lsi53c895a_exec_t : file { execute map read };
allow qemu_lsi53c895a_t qemu_lsi53c895a_img_t : file { getattr ioctl 
lock open read write };

allow qemu_lsi53c895a_t self : process fork;
allow qemu_lsi53c895a_t qemu_t : fd use;
allow qemu_lsi53c895a_t qemu_t : unix_stream_socket { read write };
allow qemu_lsi53c895a_t qemu_lsi53c895a_exec_t : file entrypoint;

#= qemu_t ==
allow qemu_t qemu_lsi53c895a_exec_t : file open;
allow qemu_t qemu_lsi53c895a_t : process { noatsecure rlimitinh siginh };
allow qemu_t virt_image_t : file write;
allow qemu_t qemu_lsi53c895a_t : process transition;
allow qemu_t qemu_lsi53c895a_exec_t : file { execute read };

#= virtd_t ==
allow virtd_t shell_exec_t : file entrypoint;




Stefan








Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-27 Thread Stefan Hajnoczi
On Tue, Mar 26, 2019 at 10:31:53AM -0400, Jag Raman wrote:
> 
> 
> On 3/26/2019 4:08 AM, Stefan Hajnoczi wrote:
> > On Fri, Mar 08, 2019 at 09:50:36AM +, Stefan Hajnoczi wrote:
> > > On Thu, Mar 07, 2019 at 03:29:41PM -0800, John G Johnson wrote:
> > > > > On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  
> > > > > wrote:
> > > > > On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> > > > > > On Thu, Mar 07, 2019 at 02:26:09PM +, Stefan Hajnoczi wrote:
> > > > > > > On Wed, Mar 06, 2019 at 11:22:53PM -0800, 
> > > > > > > elena.ufimts...@oracle.com wrote:
> > > > > > > > diff --git a/docs/devel/qemu-multiprocess.txt 
> > > > > > > > b/docs/devel/qemu-multiprocess.txt
> > > > > > > > new file mode 100644
> > > > > > > > index 000..e29c6c8
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/docs/devel/qemu-multiprocess.txt
> > > > > > > 
> > > > > > > Thanks for this document and the interesting work that you are 
> > > > > > > doing.
> > > > > > > I'd like to discuss the security advantages gained by 
> > > > > > > disaggregating
> > > > > > > QEMU in more detail.
> > > > > > > 
> > > > > > > The security model for VMs managed by libvirt (most production 
> > > > > > > x86, ppc,
> > > > > > > s390 guests) is that the QEMU process is untrusted and only has 
> > > > > > > access
> > > > > > > to resources belonging to the guest.  SELinux is used to restrict 
> > > > > > > the
> > > > > > > process from accessing other files, processes, etc on the host.
> > > > > > 
> > > > > > NB it doesn't have to be SELinux. Libvirt also supports AppArmor and
> > > > > > can even do isolation with traditional DAC by putting each QEMU 
> > > > > > under
> > > > > > a distinct UID/GID and having libvirtd set ownership on resources 
> > > > > > each
> > > > > > VM is permitted to use.
> > > > > > 
> > > > > > > QEMU does not hold privileged resources that must be kept away 
> > > > > > > from the
> > > > > > > guest.  An escaped guest can access its image file, tap file 
> > > > > > > descriptor,
> > > > > > > etc but they are the same resources it could already access via 
> > > > > > > device
> > > > > > > emulation.
> > > > > > > 
> > > > > > > Can you give specific examples of how disaggregation improves 
> > > > > > > security?
> > > > > 
> > > > > Elena & collaborators: Dan has posted some ideas but please share 
> > > > > yours
> > > > > so the security benefits of this patch series can be better 
> > > > > understood.
> > > > > 
> > > > 
> > > > Dan covered the main point.  The security regime we use 
> > > > (selinux)
> > > > constrains the actions of processes on objects, so having multiple 
> > > > processes
> > > > allows us to apply more fine-grained policies.
> > > 
> > > Please share the SELinux policy files, containerization scripts, etc.
> > > There is probably a home for them in qemu.git, libvirt.git, or elsewhere
> > > upstream.
> > > 
> > > We need to find a way to make the sandboxing improvements available to
> > > users besides yourself and easily reusable for developers who wish to
> > > convert additional device models.
> > 
> > Ping?
> > 
> > Without the scripts/policies there is no security benefit from this
> > patch series.
> 
> Hi Stefan,
> 
> We are working on this. We'll get back to you once we have this
> available.

Great, thanks!

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-26 Thread Philippe Mathieu-Daudé
Le mar. 26 mars 2019 15:34, Jag Raman  a écrit :

>
>
> On 3/26/2019 4:08 AM, Stefan Hajnoczi wrote:
> > On Fri, Mar 08, 2019 at 09:50:36AM +, Stefan Hajnoczi wrote:
> >> On Thu, Mar 07, 2019 at 03:29:41PM -0800, John G Johnson wrote:
>  On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi 
> wrote:
>  On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> > On Thu, Mar 07, 2019 at 02:26:09PM +, Stefan Hajnoczi wrote:
> >> On Wed, Mar 06, 2019 at 11:22:53PM -0800,
> elena.ufimts...@oracle.com wrote:
> >>> diff --git a/docs/devel/qemu-multiprocess.txt
> b/docs/devel/qemu-multiprocess.txt
> >>> new file mode 100644
> >>> index 000..e29c6c8
> >>> --- /dev/null
> >>> +++ b/docs/devel/qemu-multiprocess.txt
> >>
> >> Thanks for this document and the interesting work that you are
> doing.
> >> I'd like to discuss the security advantages gained by disaggregating
> >> QEMU in more detail.
> >>
> >> The security model for VMs managed by libvirt (most production x86,
> ppc,
> >> s390 guests) is that the QEMU process is untrusted and only has
> access
> >> to resources belonging to the guest.  SELinux is used to restrict
> the
> >> process from accessing other files, processes, etc on the host.
> >
> > NB it doesn't have to be SELinux. Libvirt also supports AppArmor and
> > can even do isolation with traditional DAC by putting each QEMU under
> > a distinct UID/GID and having libvirtd set ownership on resources
> each
> > VM is permitted to use.
> >
> >> QEMU does not hold privileged resources that must be kept away from
> the
> >> guest.  An escaped guest can access its image file, tap file
> descriptor,
> >> etc but they are the same resources it could already access via
> device
> >> emulation.
> >>
> >> Can you give specific examples of how disaggregation improves
> security?
> 
>  Elena & collaborators: Dan has posted some ideas but please share
> yours
>  so the security benefits of this patch series can be better
> understood.
> 
> >>>
> >>> Dan covered the main point.  The security regime we use (selinux)
> >>> constrains the actions of processes on objects, so having multiple
> processes
> >>> allows us to apply more fine-grained policies.
> >>
> >> Please share the SELinux policy files, containerization scripts, etc.
> >> There is probably a home for them in qemu.git, libvirt.git, or elsewhere
> >> upstream.
> >>
> >> We need to find a way to make the sandboxing improvements available to
> >> users besides yourself and easily reusable for developers who wish to
> >> convert additional device models.
>

Also for testing this series.

>
> > Ping?
> >
> > Without the scripts/policies there is no security benefit from this
> > patch series.
>
> Hi Stefan,
>
> We are working on this. We'll get back to you once we have this
> available.
>
> Thanks!
> --
> Jag
>
> >
> > Stefan
> >
>
>


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-26 Thread Jag Raman




On 3/26/2019 4:08 AM, Stefan Hajnoczi wrote:

On Fri, Mar 08, 2019 at 09:50:36AM +, Stefan Hajnoczi wrote:

On Thu, Mar 07, 2019 at 03:29:41PM -0800, John G Johnson wrote:

On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  wrote:
On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:

On Thu, Mar 07, 2019 at 02:26:09PM +, Stefan Hajnoczi wrote:

On Wed, Mar 06, 2019 at 11:22:53PM -0800, elena.ufimts...@oracle.com wrote:

diff --git a/docs/devel/qemu-multiprocess.txt b/docs/devel/qemu-multiprocess.txt
new file mode 100644
index 000..e29c6c8
--- /dev/null
+++ b/docs/devel/qemu-multiprocess.txt


Thanks for this document and the interesting work that you are doing.
I'd like to discuss the security advantages gained by disaggregating
QEMU in more detail.

The security model for VMs managed by libvirt (most production x86, ppc,
s390 guests) is that the QEMU process is untrusted and only has access
to resources belonging to the guest.  SELinux is used to restrict the
process from accessing other files, processes, etc on the host.


NB it doesn't have to be SELinux. Libvirt also supports AppArmor and
can even do isolation with traditional DAC by putting each QEMU under
a distinct UID/GID and having libvirtd set ownership on resources each
VM is permitted to use.


QEMU does not hold privileged resources that must be kept away from the
guest.  An escaped guest can access its image file, tap file descriptor,
etc but they are the same resources it could already access via device
emulation.

Can you give specific examples of how disaggregation improves security?


Elena & collaborators: Dan has posted some ideas but please share yours
so the security benefits of this patch series can be better understood.



Dan covered the main point.  The security regime we use (selinux)
constrains the actions of processes on objects, so having multiple processes
allows us to apply more fine-grained policies.


Please share the SELinux policy files, containerization scripts, etc.
There is probably a home for them in qemu.git, libvirt.git, or elsewhere
upstream.

We need to find a way to make the sandboxing improvements available to
users besides yourself and easily reusable for developers who wish to
convert additional device models.


Ping?

Without the scripts/policies there is no security benefit from this
patch series.


Hi Stefan,

We are working on this. We'll get back to you once we have this
available.

Thanks!
--
Jag



Stefan





Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-26 Thread Stefan Hajnoczi
On Fri, Mar 08, 2019 at 09:50:36AM +, Stefan Hajnoczi wrote:
> On Thu, Mar 07, 2019 at 03:29:41PM -0800, John G Johnson wrote:
> > > On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  wrote:
> > > On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> > >> On Thu, Mar 07, 2019 at 02:26:09PM +, Stefan Hajnoczi wrote:
> > >>> On Wed, Mar 06, 2019 at 11:22:53PM -0800, elena.ufimts...@oracle.com 
> > >>> wrote:
> >  diff --git a/docs/devel/qemu-multiprocess.txt 
> >  b/docs/devel/qemu-multiprocess.txt
> >  new file mode 100644
> >  index 000..e29c6c8
> >  --- /dev/null
> >  +++ b/docs/devel/qemu-multiprocess.txt
> > >>> 
> > >>> Thanks for this document and the interesting work that you are doing.
> > >>> I'd like to discuss the security advantages gained by disaggregating
> > >>> QEMU in more detail.
> > >>> 
> > >>> The security model for VMs managed by libvirt (most production x86, ppc,
> > >>> s390 guests) is that the QEMU process is untrusted and only has access
> > >>> to resources belonging to the guest.  SELinux is used to restrict the
> > >>> process from accessing other files, processes, etc on the host.
> > >> 
> > >> NB it doesn't have to be SELinux. Libvirt also supports AppArmor and
> > >> can even do isolation with traditional DAC by putting each QEMU under
> > >> a distinct UID/GID and having libvirtd set ownership on resources each
> > >> VM is permitted to use.
> > >> 
> > >>> QEMU does not hold privileged resources that must be kept away from the
> > >>> guest.  An escaped guest can access its image file, tap file descriptor,
> > >>> etc but they are the same resources it could already access via device
> > >>> emulation.
> > >>> 
> > >>> Can you give specific examples of how disaggregation improves security?
> > > 
> > > Elena & collaborators: Dan has posted some ideas but please share yours
> > > so the security benefits of this patch series can be better understood.
> > > 
> > 
> > Dan covered the main point.  The security regime we use (selinux)
> > constrains the actions of processes on objects, so having multiple processes
> > allows us to apply more fine-grained policies.
> 
> Please share the SELinux policy files, containerization scripts, etc.
> There is probably a home for them in qemu.git, libvirt.git, or elsewhere
> upstream.
> 
> We need to find a way to make the sandboxing improvements available to
> users besides yourself and easily reusable for developers who wish to
> convert additional device models.

Ping?

Without the scripts/policies there is no security benefit from this
patch series.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-22 Thread Daniel P . Berrangé
On Thu, Mar 21, 2019 at 08:26:47PM -0700, John G Johnson wrote:
> 
>  
> >  On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> > 
> >> On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  wrote:
> >>> 
> >>> On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
>  I guess one obvious answer is that the existing security mechanisms like
>  SELinux/ApArmor/DAC can be made to work in a more fine grained manner if
>  there are distinct processes. This would allow for a more useful seccomp
>  filter to better protect against secondary kernel exploits should QEMU
>  itself be exploited, if we can protect individual components.
> >>> 
> >>> Fine-grained sandboxing is possible in theory but tedious in practice.
> >>> From what I can tell this patch series doesn't implement any sandboxing
> >>> for child processes.
> >>> 
> >> 
> >> The policies aren’t in QEMU, but in the selinux config files.
> >> They would say, for example, that when the QEMU process exec()s the
> >> disk emulation process, the process security context type transitions
> >> to a new type.  This type would have permission to access the VM image
> >> objects, whereas the QEMU process type (and any other device emulation
> >> process types) cannot access them.
> > 
> > 
> > Note that currently all QEMU instances run by libvirt have seccomp
> > policy applied that explicitly forbids any use of fork+exec as a way
> > to reduce avenues of attack for an exploited QEMU.
> > 
> > Even in a modularized QEMU I'd be loathe to allow QEMU to have the
> > fork+exec privileged, unless "QEMU" in this case was just a stub
> > process that does nothing more than fork+exec the other binaries,
> > while having zero attack exposed to the untrusted guest OS.
> > 
> 
>   We’re looking at a couple ways to address your concerns.
> One is a stub process, as you mentioned above, but if we need to
> create programming to fork() and exec() the required emulation
> programs before exec()ing QEMU, then it may make sense to just put
> that programming into libvirt itself.
> 
>   Both paths would need similar changes to QEMU, such as the
> ability to receive descriptions of the emulation processes the parent
> process has created, and file descriptors that it has setup to
> communicate with them.  Each remote device would then be matched with
> its corresponding external process.
> 
>   The difference would be whether to create a new stub program
> to create the emulation processes, or delegate that task to libvirt’s
> QEMU driver.
> 
>   Do you have an opinion on a stub program vs libvirt integration?

Libvirt preference would be to retain full control over what programs
are spawned. This allows us to control their resource usage / placement
/ security policies. Having a stub that hides this from libvirt will
make this control harder, as we'll then need to interogate the stub
to find out what it did & applying controls appropriately. Also if more
external processes need to be spawned when hotplugging a device, then
libvirt would definitely want to have full control, as once QEMU vCPUS
have been started we don't trust it.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-21 Thread John G Johnson


 
>  On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> 
>> On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  wrote:
>>> 
>>> On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
 I guess one obvious answer is that the existing security mechanisms like
 SELinux/ApArmor/DAC can be made to work in a more fine grained manner if
 there are distinct processes. This would allow for a more useful seccomp
 filter to better protect against secondary kernel exploits should QEMU
 itself be exploited, if we can protect individual components.
>>> 
>>> Fine-grained sandboxing is possible in theory but tedious in practice.
>>> From what I can tell this patch series doesn't implement any sandboxing
>>> for child processes.
>>> 
>> 
>> The policies aren’t in QEMU, but in the selinux config files.
>> They would say, for example, that when the QEMU process exec()s the
>> disk emulation process, the process security context type transitions
>> to a new type.  This type would have permission to access the VM image
>> objects, whereas the QEMU process type (and any other device emulation
>> process types) cannot access them.
> 
> 
> Note that currently all QEMU instances run by libvirt have seccomp
> policy applied that explicitly forbids any use of fork+exec as a way
> to reduce avenues of attack for an exploited QEMU.
> 
> Even in a modularized QEMU I'd be loathe to allow QEMU to have the
> fork+exec privileged, unless "QEMU" in this case was just a stub
> process that does nothing more than fork+exec the other binaries,
> while having zero attack exposed to the untrusted guest OS.
> 

We’re looking at a couple ways to address your concerns.
One is a stub process, as you mentioned above, but if we need to
create programming to fork() and exec() the required emulation
programs before exec()ing QEMU, then it may make sense to just put
that programming into libvirt itself.

Both paths would need similar changes to QEMU, such as the
ability to receive descriptions of the emulation processes the parent
process has created, and file descriptors that it has setup to
communicate with them.  Each remote device would then be matched with
its corresponding external process.

The difference would be whether to create a new stub program
to create the emulation processes, or delegate that task to libvirt’s
QEMU driver.

Do you have an opinion on a stub program vs libvirt integration?

JJ




Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-11 Thread Daniel P . Berrangé
On Thu, Mar 07, 2019 at 03:29:41PM -0800, John G Johnson wrote:
> 
> 
> > On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  wrote:
> > 
> > On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> >> I guess one obvious answer is that the existing security mechanisms like
> >> SELinux/ApArmor/DAC can be made to work in a more fine grained manner if
> >> there are distinct processes. This would allow for a more useful seccomp
> >> filter to better protect against secondary kernel exploits should QEMU
> >> itself be exploited, if we can protect individual components.
> > 
> > Fine-grained sandboxing is possible in theory but tedious in practice.
> > From what I can tell this patch series doesn't implement any sandboxing
> > for child processes.
> > 
> 
>   The policies aren’t in QEMU, but in the selinux config files.
> They would say, for example, that when the QEMU process exec()s the
> disk emulation process, the process security context type transitions
> to a new type.  This type would have permission to access the VM image
> objects, whereas the QEMU process type (and any other device emulation
> process types) cannot access them.

Note that currently all QEMU instances run by libvirt have seccomp
policy applied that explicitly forbids any use of fork+exec as a way
to reduce avenues of attack for an exploited QEMU.

Even in a modularized QEMU I'd be loathe to allow QEMU to have the
fork+exec privileged, unless "QEMU" in this case was just a stub
process that does nothing more than fork+exec the other binaries,
while having zero attack exposed to the untrusted guest OS.

>   If you wanted to use DAC, you could do the something similar by
> making the disk emulation executable setuid to a UID than can access
> VM image files.
> 
>   In either case, the policies and permissions are set up before
> libvirt even runs, so it doesn’t need to be aware of them.

That's not the case bearing in mind the above point about fork+exec
being forbidden. It would likely require libvirt to be in charge of
spawning the various helper binaries from a trusted context.


> > How to do this in practice must be clear from the beginning if
> > fine-grained sandboxing is the main selling point.
> > 
> > Some details to start the discussion:
> > 
> > * How will fine-grained SELinux/AppArmor/DAC policies be configured for
> >   each process?  I guess this requires root, so does libvirt need to
> >   know about each process?
> > 
> 
>   The polices would apply to process security context types (or
> UIDs in a DAC regime), so I would not expect libvirt to be aware of them.

I'm pretty skeptical that such a large modularization of QEMU can be
done without libvirt being aware of it & needing some kind of changes
applied.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-08 Thread Elena Ufimtseva
On Thu, Mar 07, 2019 at 03:16:42PM +0100, Kevin Wolf wrote:
> Am 07.03.2019 um 09:14 hat Thomas Huth geschrieben:
> > On 07/03/2019 08.22, elena.ufimts...@oracle.com wrote:
> > > From: Elena Ufimtseva 
> > > 
> > > TODO: Make relevant changes to the doc.
> > > 
> > > Signed-off-by: John G Johnson 
> > > Signed-off-by: Elena Ufimtseva 
> > > Signed-off-by: Jagannathan Raman 
> > > ---
> > >  docs/devel/qemu-multiprocess.txt | 1109 
> > > ++
> > >  1 file changed, 1109 insertions(+)
> > >  create mode 100644 docs/devel/qemu-multiprocess.txt
> > > 
> > > diff --git a/docs/devel/qemu-multiprocess.txt 
> > > b/docs/devel/qemu-multiprocess.txt
> > > new file mode 100644
> > > index 000..e29c6c8
> > > --- /dev/null
> > > +++ b/docs/devel/qemu-multiprocess.txt
> > > @@ -0,0 +1,1109 @@
> > > +/*
> > > + * Copyright 2019, Oracle and/or its affiliates. All rights reserved.
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining 
> > > a copy
> > > + * of this software and associated documentation files (the "Software"), 
> > > to deal
> > > + * in the Software without restriction, including without limitation the 
> > > rights
> > > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
> > > sell
> > > + * copies of the Software, and to permit persons to whom the Software is
> > > + * furnished to do so, subject to the following conditions:
> > > + *
> > > + * The above copyright notice and this permission notice shall be 
> > > included in
> > > + * all copies or substantial portions of the Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> > > EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> > > MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
> > > SHALL
> > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> > > OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> > > ARISING FROM,
> > > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> > > DEALINGS IN
> > > + * THE SOFTWARE.
> > > + */
> > 
> > Somehow weird to see such a big license statement talking about
> > "software", but which applies to a text file only... Not sure if it is
> > an option for you, but maybe one of the Creative Common licenses
> > (dual-licensed with the GPLv2+) would be a better fit? E.g. for the QEMU
> > website, the content is dual-licensed: https://www.qemu.org/license.html
> 

Thanks Thomas,
working on figuring this part out.

> While we're talking about licenses, the "All rights reserved." notice is
> out of place in a license header that declares that a lot of permissions
> are granted. Better to remove it to avoid any ambiguities that could
> result from the contradiction. (Applies to the whole series.)
>

Thanks Kevin,

This will be removed.

Elena

> Kevin



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-08 Thread Stefan Hajnoczi
On Thu, Mar 07, 2019 at 03:29:41PM -0800, John G Johnson wrote:
> > On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  wrote:
> > On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> >> On Thu, Mar 07, 2019 at 02:26:09PM +, Stefan Hajnoczi wrote:
> >>> On Wed, Mar 06, 2019 at 11:22:53PM -0800, elena.ufimts...@oracle.com 
> >>> wrote:
>  diff --git a/docs/devel/qemu-multiprocess.txt 
>  b/docs/devel/qemu-multiprocess.txt
>  new file mode 100644
>  index 000..e29c6c8
>  --- /dev/null
>  +++ b/docs/devel/qemu-multiprocess.txt
> >>> 
> >>> Thanks for this document and the interesting work that you are doing.
> >>> I'd like to discuss the security advantages gained by disaggregating
> >>> QEMU in more detail.
> >>> 
> >>> The security model for VMs managed by libvirt (most production x86, ppc,
> >>> s390 guests) is that the QEMU process is untrusted and only has access
> >>> to resources belonging to the guest.  SELinux is used to restrict the
> >>> process from accessing other files, processes, etc on the host.
> >> 
> >> NB it doesn't have to be SELinux. Libvirt also supports AppArmor and
> >> can even do isolation with traditional DAC by putting each QEMU under
> >> a distinct UID/GID and having libvirtd set ownership on resources each
> >> VM is permitted to use.
> >> 
> >>> QEMU does not hold privileged resources that must be kept away from the
> >>> guest.  An escaped guest can access its image file, tap file descriptor,
> >>> etc but they are the same resources it could already access via device
> >>> emulation.
> >>> 
> >>> Can you give specific examples of how disaggregation improves security?
> > 
> > Elena & collaborators: Dan has posted some ideas but please share yours
> > so the security benefits of this patch series can be better understood.
> > 
> 
>   Dan covered the main point.  The security regime we use (selinux)
> constrains the actions of processes on objects, so having multiple processes
> allows us to apply more fine-grained policies.

Please share the SELinux policy files, containerization scripts, etc.
There is probably a home for them in qemu.git, libvirt.git, or elsewhere
upstream.

We need to find a way to make the sandboxing improvements available to
users besides yourself and easily reusable for developers who wish to
convert additional device models.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread John G Johnson



> On Mar 7, 2019, at 11:27 AM, Stefan Hajnoczi  wrote:
> 
> On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
>> On Thu, Mar 07, 2019 at 02:26:09PM +, Stefan Hajnoczi wrote:
>>> On Wed, Mar 06, 2019 at 11:22:53PM -0800, elena.ufimts...@oracle.com wrote:
 diff --git a/docs/devel/qemu-multiprocess.txt 
 b/docs/devel/qemu-multiprocess.txt
 new file mode 100644
 index 000..e29c6c8
 --- /dev/null
 +++ b/docs/devel/qemu-multiprocess.txt
>>> 
>>> Thanks for this document and the interesting work that you are doing.
>>> I'd like to discuss the security advantages gained by disaggregating
>>> QEMU in more detail.
>>> 
>>> The security model for VMs managed by libvirt (most production x86, ppc,
>>> s390 guests) is that the QEMU process is untrusted and only has access
>>> to resources belonging to the guest.  SELinux is used to restrict the
>>> process from accessing other files, processes, etc on the host.
>> 
>> NB it doesn't have to be SELinux. Libvirt also supports AppArmor and
>> can even do isolation with traditional DAC by putting each QEMU under
>> a distinct UID/GID and having libvirtd set ownership on resources each
>> VM is permitted to use.
>> 
>>> QEMU does not hold privileged resources that must be kept away from the
>>> guest.  An escaped guest can access its image file, tap file descriptor,
>>> etc but they are the same resources it could already access via device
>>> emulation.
>>> 
>>> Can you give specific examples of how disaggregation improves security?
> 
> Elena & collaborators: Dan has posted some ideas but please share yours
> so the security benefits of this patch series can be better understood.
> 

Dan covered the main point.  The security regime we use (selinux)
constrains the actions of processes on objects, so having multiple processes
allows us to apply more fine-grained policies.


>> I guess one obvious answer is that the existing security mechanisms like
>> SELinux/ApArmor/DAC can be made to work in a more fine grained manner if
>> there are distinct processes. This would allow for a more useful seccomp
>> filter to better protect against secondary kernel exploits should QEMU
>> itself be exploited, if we can protect individual components.
> 
> Fine-grained sandboxing is possible in theory but tedious in practice.
> From what I can tell this patch series doesn't implement any sandboxing
> for child processes.
> 

The policies aren’t in QEMU, but in the selinux config files.
They would say, for example, that when the QEMU process exec()s the
disk emulation process, the process security context type transitions
to a new type.  This type would have permission to access the VM image
objects, whereas the QEMU process type (and any other device emulation
process types) cannot access them.

If you wanted to use DAC, you could do the something similar by
making the disk emulation executable setuid to a UID than can access
VM image files.

In either case, the policies and permissions are set up before
libvirt even runs, so it doesn’t need to be aware of them.


> There must be a convenient way to get fine-grained sandboxing for
> disaggregated devices.  In other words, it shouldn't be left as an
> exercise to device process authors.
> 

We can add some MAC or DAC suggestions in the documentation.


> How to do this in practice must be clear from the beginning if
> fine-grained sandboxing is the main selling point.
> 
> Some details to start the discussion:
> 
> * How will fine-grained SELinux/AppArmor/DAC policies be configured for
>   each process?  I guess this requires root, so does libvirt need to
>   know about each process?
> 

The polices would apply to process security context types (or
UIDs in a DAC regime), so I would not expect libvirt to be aware of them.


> * We need to make sure that processes cannot send signals to each
>   other, ptrace, interfere in /proc/$PID, etc.  How will this be done?
> 

Any process type restrictions would be enforced by selinux.


> * Were you planning to use any other sandboxing mechanisms
>   (namespaces?)?  How will they be set up if the device processed is
>   forked/executed by an unprivileged QEMU?
> 

All of the QEMU-related process related to a single VM will run
in the same container, but the container is created, along with it selinux
policies, before libvirt is run.


>> Not everything is protected by MAC/DAC. For example network based disks
>> typically have a username + password for accessing the remote storage
>> server. Best practice would be a distinct username for every QEMU process
>> such that each can only access its own storage, but I don't know of any
>> app which does that. So ability to split off backends into separate
>> processes could limit exposure of information that is not otherwise
>> protected by current protection models.
> 
> If the disaggregated disk process with a global username + password is
> 

Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Stefan Hajnoczi
On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> On Thu, Mar 07, 2019 at 02:26:09PM +, Stefan Hajnoczi wrote:
> > On Wed, Mar 06, 2019 at 11:22:53PM -0800, elena.ufimts...@oracle.com wrote:
> > > diff --git a/docs/devel/qemu-multiprocess.txt 
> > > b/docs/devel/qemu-multiprocess.txt
> > > new file mode 100644
> > > index 000..e29c6c8
> > > --- /dev/null
> > > +++ b/docs/devel/qemu-multiprocess.txt
> > 
> > Thanks for this document and the interesting work that you are doing.
> > I'd like to discuss the security advantages gained by disaggregating
> > QEMU in more detail.
> > 
> > The security model for VMs managed by libvirt (most production x86, ppc,
> > s390 guests) is that the QEMU process is untrusted and only has access
> > to resources belonging to the guest.  SELinux is used to restrict the
> > process from accessing other files, processes, etc on the host.
> 
> NB it doesn't have to be SELinux. Libvirt also supports AppArmor and
> can even do isolation with traditional DAC by putting each QEMU under
> a distinct UID/GID and having libvirtd set ownership on resources each
> VM is permitted to use.
> 
> > QEMU does not hold privileged resources that must be kept away from the
> > guest.  An escaped guest can access its image file, tap file descriptor,
> > etc but they are the same resources it could already access via device
> > emulation.
> > 
> > Can you give specific examples of how disaggregation improves security?

Elena & collaborators: Dan has posted some ideas but please share yours
so the security benefits of this patch series can be better understood.

> I guess one obvious answer is that the existing security mechanisms like
> SELinux/ApArmor/DAC can be made to work in a more fine grained manner if
> there are distinct processes. This would allow for a more useful seccomp
> filter to better protect against secondary kernel exploits should QEMU
> itself be exploited, if we can protect individual components.

Fine-grained sandboxing is possible in theory but tedious in practice.
From what I can tell this patch series doesn't implement any sandboxing
for child processes.

There must be a convenient way to get fine-grained sandboxing for
disaggregated devices.  In other words, it shouldn't be left as an
exercise to device process authors.

How to do this in practice must be clear from the beginning if
fine-grained sandboxing is the main selling point.

Some details to start the discussion:

 * How will fine-grained SELinux/AppArmor/DAC policies be configured for
   each process?  I guess this requires root, so does libvirt need to
   know about each process?

 * We need to make sure that processes cannot send signals to each
   other, ptrace, interfere in /proc/$PID, etc.  How will this be done?

 * Were you planning to use any other sandboxing mechanisms
   (namespaces?)?  How will they be set up if the device processed is
   forked/executed by an unprivileged QEMU?

> Not everything is protected by MAC/DAC. For example network based disks
> typically have a username + password for accessing the remote storage
> server. Best practice would be a distinct username for every QEMU process
> such that each can only access its own storage, but I don't know of any
> app which does that. So ability to split off backends into separate
> processes could limit exposure of information that is not otherwise
> protected by current protection models.

If the disaggregated disk process with a global username + password is
compromised then all your disk images are compromised.  So you still
need to follow the best practice of per-VM credentials even with
disaggregation, and if you do then disaggregation doesn't add anything!

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Daniel P . Berrangé
On Thu, Mar 07, 2019 at 11:46:19AM -0500, Michael S. Tsirkin wrote:
> On Thu, Mar 07, 2019 at 04:19:44PM +, Daniel P. Berrangé wrote:
> > On Thu, Mar 07, 2019 at 11:05:36AM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> > > > The broadening of vhost-user support is useful with that in much the
> > > > same way I imagine.
> > > 
> > > vhost user has more of an impact but is also a bigger maintainance
> > > burden as clients are packaged, can be restarted etc individually.
> > 
> > It feels like we're having/accepted that cost already though since
> > vhostuser exists today & has been expanding to cover more backends.
> 
> What I am trying to say is that we could eaily add support for
> extensions just for in-tree code since these don't create an API that
> needs to be maintained.
> 
> So e.g. we do not need feature negotiation.

Ah, I see what you mean now. Having stuff in-tree makes migration
saner too since we don't have combinatorial expansion of impls to
worry about testnig

> But yes, this could be an extension of vhost-user in some way.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Michael S. Tsirkin
On Thu, Mar 07, 2019 at 04:19:44PM +, Daniel P. Berrangé wrote:
> On Thu, Mar 07, 2019 at 11:05:36AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> > > The broadening of vhost-user support is useful with that in much the
> > > same way I imagine.
> > 
> > vhost user has more of an impact but is also a bigger maintainance
> > burden as clients are packaged, can be restarted etc individually.
> 
> It feels like we're having/accepted that cost already though since
> vhostuser exists today & has been expanding to cover more backends.
> 
> Regards,
> Daniel

What I am trying to say is that we could eaily add support for
extensions just for in-tree code since these don't create an API that
needs to be maintained.

So e.g. we do not need feature negotiation.

But yes, this could be an extension of vhost-user in some way.

> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Daniel P . Berrangé
On Thu, Mar 07, 2019 at 11:05:36AM -0500, Michael S. Tsirkin wrote:
> On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> > The broadening of vhost-user support is useful with that in much the
> > same way I imagine.
> 
> vhost user has more of an impact but is also a bigger maintainance
> burden as clients are packaged, can be restarted etc individually.

It feels like we're having/accepted that cost already though since
vhostuser exists today & has been expanding to cover more backends.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Michael S. Tsirkin
On Thu, Mar 07, 2019 at 02:51:20PM +, Daniel P. Berrangé wrote:
> The broadening of vhost-user support is useful with that in much the
> same way I imagine.

vhost user has more of an impact but is also a bigger maintainance
burden as clients are packaged, can be restarted etc individually.

-- 
MST



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Konrad Rzeszutek Wilk
On Thu, Mar 07, 2019 at 03:21:47PM +0100, Thomas Huth wrote:
> On 07/03/2019 15.16, Kevin Wolf wrote:
> > Am 07.03.2019 um 09:14 hat Thomas Huth geschrieben:
> >> On 07/03/2019 08.22, elena.ufimts...@oracle.com wrote:
> >>> From: Elena Ufimtseva 
> >>>
> >>> TODO: Make relevant changes to the doc.
> >>>
> >>> Signed-off-by: John G Johnson 
> >>> Signed-off-by: Elena Ufimtseva 
> >>> Signed-off-by: Jagannathan Raman 
> >>> ---
> >>>  docs/devel/qemu-multiprocess.txt | 1109 
> >>> ++
> >>>  1 file changed, 1109 insertions(+)
> >>>  create mode 100644 docs/devel/qemu-multiprocess.txt
> >>>
> >>> diff --git a/docs/devel/qemu-multiprocess.txt 
> >>> b/docs/devel/qemu-multiprocess.txt
> >>> new file mode 100644
> >>> index 000..e29c6c8
> >>> --- /dev/null
> >>> +++ b/docs/devel/qemu-multiprocess.txt
> >>> @@ -0,0 +1,1109 @@
> >>> +/*
> >>> + * Copyright 2019, Oracle and/or its affiliates. All rights reserved.
> >>> + *
> >>> + * Permission is hereby granted, free of charge, to any person obtaining 
> >>> a copy
> >>> + * of this software and associated documentation files (the "Software"), 
> >>> to deal
> >>> + * in the Software without restriction, including without limitation the 
> >>> rights
> >>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
> >>> sell
> >>> + * copies of the Software, and to permit persons to whom the Software is
> >>> + * furnished to do so, subject to the following conditions:
> >>> + *
> >>> + * The above copyright notice and this permission notice shall be 
> >>> included in
> >>> + * all copies or substantial portions of the Software.
> >>> + *
> >>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> >>> EXPRESS OR
> >>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> >>> MERCHANTABILITY,
> >>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
> >>> SHALL
> >>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> >>> OTHER
> >>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> >>> ARISING FROM,
> >>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> >>> DEALINGS IN
> >>> + * THE SOFTWARE.
> >>> + */
> >>
> >> Somehow weird to see such a big license statement talking about
> >> "software", but which applies to a text file only... Not sure if it is
> >> an option for you, but maybe one of the Creative Common licenses
> >> (dual-licensed with the GPLv2+) would be a better fit? E.g. for the QEMU
> >> website, the content is dual-licensed: https://www.qemu.org/license.html
> > 
> > While we're talking about licenses, the "All rights reserved." notice is
> > out of place in a license header that declares that a lot of permissions
> > are granted. Better to remove it to avoid any ambiguities that could
> > result from the contradiction. (Applies to the whole series.)
> 
> Apart from that, it is also not required for other work anymore. See:
> 
> https://en.wikipedia.org/wiki/All_rights_reserved

Interesting. Do folks know why the Linux Foundation does it?

See for example cf0d37aecc06801d4847fb36740da4a5690d9d45 (in the Linux kernel)
where every change they stamp it with their 'All Rights Reserved'?

> 
>  Thomas



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Thomas Huth
On 07/03/2019 15.40, Konrad Rzeszutek Wilk wrote:
> On Thu, Mar 07, 2019 at 03:21:47PM +0100, Thomas Huth wrote:
>> On 07/03/2019 15.16, Kevin Wolf wrote:
>>> Am 07.03.2019 um 09:14 hat Thomas Huth geschrieben:
 On 07/03/2019 08.22, elena.ufimts...@oracle.com wrote:
> From: Elena Ufimtseva 
>
> TODO: Make relevant changes to the doc.
>
> Signed-off-by: John G Johnson 
> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: Jagannathan Raman 
> ---
>  docs/devel/qemu-multiprocess.txt | 1109 
> ++
>  1 file changed, 1109 insertions(+)
>  create mode 100644 docs/devel/qemu-multiprocess.txt
>
> diff --git a/docs/devel/qemu-multiprocess.txt 
> b/docs/devel/qemu-multiprocess.txt
> new file mode 100644
> index 000..e29c6c8
> --- /dev/null
> +++ b/docs/devel/qemu-multiprocess.txt
> @@ -0,0 +1,1109 @@
> +/*
> + * Copyright 2019, Oracle and/or its affiliates. All rights reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining 
> a copy
> + * of this software and associated documentation files (the "Software"), 
> to deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
> sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be 
> included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS IN
> + * THE SOFTWARE.
> + */

 Somehow weird to see such a big license statement talking about
 "software", but which applies to a text file only... Not sure if it is
 an option for you, but maybe one of the Creative Common licenses
 (dual-licensed with the GPLv2+) would be a better fit? E.g. for the QEMU
 website, the content is dual-licensed: https://www.qemu.org/license.html
>>>
>>> While we're talking about licenses, the "All rights reserved." notice is
>>> out of place in a license header that declares that a lot of permissions
>>> are granted. Better to remove it to avoid any ambiguities that could
>>> result from the contradiction. (Applies to the whole series.)
>>
>> Apart from that, it is also not required for other work anymore. See:
>>
>> https://en.wikipedia.org/wiki/All_rights_reserved
> 
> Interesting. Do folks know why the Linux Foundation does it?
> 
> See for example cf0d37aecc06801d4847fb36740da4a5690d9d45 (in the Linux kernel)
> where every change they stamp it with their 'All Rights Reserved'?

No clue why they use it. Seems unnecessary to me. But as always: IANAL

 Thomas



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Daniel P . Berrangé
On Thu, Mar 07, 2019 at 02:26:09PM +, Stefan Hajnoczi wrote:
> On Wed, Mar 06, 2019 at 11:22:53PM -0800, elena.ufimts...@oracle.com wrote:
> > diff --git a/docs/devel/qemu-multiprocess.txt 
> > b/docs/devel/qemu-multiprocess.txt
> > new file mode 100644
> > index 000..e29c6c8
> > --- /dev/null
> > +++ b/docs/devel/qemu-multiprocess.txt
> 
> Thanks for this document and the interesting work that you are doing.
> I'd like to discuss the security advantages gained by disaggregating
> QEMU in more detail.
> 
> The security model for VMs managed by libvirt (most production x86, ppc,
> s390 guests) is that the QEMU process is untrusted and only has access
> to resources belonging to the guest.  SELinux is used to restrict the
> process from accessing other files, processes, etc on the host.

NB it doesn't have to be SELinux. Libvirt also supports AppArmor and
can even do isolation with traditional DAC by putting each QEMU under
a distinct UID/GID and having libvirtd set ownership on resources each
VM is permitted to use.

> QEMU does not hold privileged resources that must be kept away from the
> guest.  An escaped guest can access its image file, tap file descriptor,
> etc but they are the same resources it could already access via device
> emulation.
> 
> Can you give specific examples of how disaggregation improves security?

I guess one obvious answer is that the existing security mechanisms like
SELinux/ApArmor/DAC can be made to work in a more fine grained manner if
there are distinct processes. This would allow for a more useful seccomp
filter to better protect against secondary kernel exploits should QEMU
itself be exploited, if we can protect individual components.

Not everything is protected by MAC/DAC. For example network based disks
typically have a username + password for accessing the remote storage
server. Best practice would be a distinct username for every QEMU process
such that each can only access its own storage, but I don't know of any
app which does that. So ability to split off backends into separate
processes could limit exposure of information that is not otherwise
protected by current protection models.

Whether any of this is useful in practice depends on the degree to
which the individual disaggregated pieces of QEMU trust each other.
Effectively they would have to consider each other as untrusted,
so one compromised piece can't simply trigger its desired exploit
via the communication channel with another disaggregated piece.

The broadening of vhost-user support is useful with that in much the
same way I imagine.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Stefan Hajnoczi
On Wed, Mar 06, 2019 at 11:22:53PM -0800, elena.ufimts...@oracle.com wrote:
> diff --git a/docs/devel/qemu-multiprocess.txt 
> b/docs/devel/qemu-multiprocess.txt
> new file mode 100644
> index 000..e29c6c8
> --- /dev/null
> +++ b/docs/devel/qemu-multiprocess.txt

Thanks for this document and the interesting work that you are doing.
I'd like to discuss the security advantages gained by disaggregating
QEMU in more detail.

The security model for VMs managed by libvirt (most production x86, ppc,
s390 guests) is that the QEMU process is untrusted and only has access
to resources belonging to the guest.  SELinux is used to restrict the
process from accessing other files, processes, etc on the host.

QEMU does not hold privileged resources that must be kept away from the
guest.  An escaped guest can access its image file, tap file descriptor,
etc but they are the same resources it could already access via device
emulation.

Can you give specific examples of how disaggregation improves security?

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Thomas Huth
On 07/03/2019 15.16, Kevin Wolf wrote:
> Am 07.03.2019 um 09:14 hat Thomas Huth geschrieben:
>> On 07/03/2019 08.22, elena.ufimts...@oracle.com wrote:
>>> From: Elena Ufimtseva 
>>>
>>> TODO: Make relevant changes to the doc.
>>>
>>> Signed-off-by: John G Johnson 
>>> Signed-off-by: Elena Ufimtseva 
>>> Signed-off-by: Jagannathan Raman 
>>> ---
>>>  docs/devel/qemu-multiprocess.txt | 1109 
>>> ++
>>>  1 file changed, 1109 insertions(+)
>>>  create mode 100644 docs/devel/qemu-multiprocess.txt
>>>
>>> diff --git a/docs/devel/qemu-multiprocess.txt 
>>> b/docs/devel/qemu-multiprocess.txt
>>> new file mode 100644
>>> index 000..e29c6c8
>>> --- /dev/null
>>> +++ b/docs/devel/qemu-multiprocess.txt
>>> @@ -0,0 +1,1109 @@
>>> +/*
>>> + * Copyright 2019, Oracle and/or its affiliates. All rights reserved.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a 
>>> copy
>>> + * of this software and associated documentation files (the "Software"), 
>>> to deal
>>> + * in the Software without restriction, including without limitation the 
>>> rights
>>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
>>> sell
>>> + * copies of the Software, and to permit persons to whom the Software is
>>> + * furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be included 
>>> in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>>> OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>>> OTHER
>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
>>> FROM,
>>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 
>>> IN
>>> + * THE SOFTWARE.
>>> + */
>>
>> Somehow weird to see such a big license statement talking about
>> "software", but which applies to a text file only... Not sure if it is
>> an option for you, but maybe one of the Creative Common licenses
>> (dual-licensed with the GPLv2+) would be a better fit? E.g. for the QEMU
>> website, the content is dual-licensed: https://www.qemu.org/license.html
> 
> While we're talking about licenses, the "All rights reserved." notice is
> out of place in a license header that declares that a lot of permissions
> are granted. Better to remove it to avoid any ambiguities that could
> result from the contradiction. (Applies to the whole series.)

Apart from that, it is also not required for other work anymore. See:

https://en.wikipedia.org/wiki/All_rights_reserved

 Thomas



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Kevin Wolf
Am 07.03.2019 um 09:14 hat Thomas Huth geschrieben:
> On 07/03/2019 08.22, elena.ufimts...@oracle.com wrote:
> > From: Elena Ufimtseva 
> > 
> > TODO: Make relevant changes to the doc.
> > 
> > Signed-off-by: John G Johnson 
> > Signed-off-by: Elena Ufimtseva 
> > Signed-off-by: Jagannathan Raman 
> > ---
> >  docs/devel/qemu-multiprocess.txt | 1109 
> > ++
> >  1 file changed, 1109 insertions(+)
> >  create mode 100644 docs/devel/qemu-multiprocess.txt
> > 
> > diff --git a/docs/devel/qemu-multiprocess.txt 
> > b/docs/devel/qemu-multiprocess.txt
> > new file mode 100644
> > index 000..e29c6c8
> > --- /dev/null
> > +++ b/docs/devel/qemu-multiprocess.txt
> > @@ -0,0 +1,1109 @@
> > +/*
> > + * Copyright 2019, Oracle and/or its affiliates. All rights reserved.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a 
> > copy
> > + * of this software and associated documentation files (the "Software"), 
> > to deal
> > + * in the Software without restriction, including without limitation the 
> > rights
> > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
> > sell
> > + * copies of the Software, and to permit persons to whom the Software is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included 
> > in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> > OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> > OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> > FROM,
> > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 
> > IN
> > + * THE SOFTWARE.
> > + */
> 
> Somehow weird to see such a big license statement talking about
> "software", but which applies to a text file only... Not sure if it is
> an option for you, but maybe one of the Creative Common licenses
> (dual-licensed with the GPLv2+) would be a better fit? E.g. for the QEMU
> website, the content is dual-licensed: https://www.qemu.org/license.html

While we're talking about licenses, the "All rights reserved." notice is
out of place in a license header that declares that a lot of permissions
are granted. Better to remove it to avoid any ambiguities that could
result from the contradiction. (Applies to the whole series.)

Kevin



Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

2019-03-07 Thread Thomas Huth
On 07/03/2019 08.22, elena.ufimts...@oracle.com wrote:
> From: Elena Ufimtseva 
> 
> TODO: Make relevant changes to the doc.
> 
> Signed-off-by: John G Johnson 
> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: Jagannathan Raman 
> ---
>  docs/devel/qemu-multiprocess.txt | 1109 
> ++
>  1 file changed, 1109 insertions(+)
>  create mode 100644 docs/devel/qemu-multiprocess.txt
> 
> diff --git a/docs/devel/qemu-multiprocess.txt 
> b/docs/devel/qemu-multiprocess.txt
> new file mode 100644
> index 000..e29c6c8
> --- /dev/null
> +++ b/docs/devel/qemu-multiprocess.txt
> @@ -0,0 +1,1109 @@
> +/*
> + * Copyright 2019, Oracle and/or its affiliates. All rights reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */

Somehow weird to see such a big license statement talking about
"software", but which applies to a text file only... Not sure if it is
an option for you, but maybe one of the Creative Common licenses
(dual-licensed with the GPLv2+) would be a better fit? E.g. for the QEMU
website, the content is dual-licensed: https://www.qemu.org/license.html

 Thomas