Re: [Xen-devel] Livepatching and Xen Security

2017-05-22 Thread Konrad Rzeszutek Wilk
> > 1. Having tested live-patching thoroughly for at least some version of
> > the codebase
> >
> > 2. Having tested live-patching for one of the Xen 4.9 RCs.
> >
> > Thoughts?
> 
> As a statement of what XenServer is doing:

As a statement of what Oracle is doing.

We have been using livepatching for a year or so.

It is a bit older (thanks to Xend dependency, ) so not as
fresh as Xen 4.9.

We had quite a few of livepatches, including some that are not
XSAs, with success. We did run in some issues:

 - We compiled Xen with '--maxcpus=384' and the livepatch tools didn't include
   that, which made all the GCC compiled local variable names different. Once we
   got the --maxcpus=XYZ to match it all was good. But that took a while
   to figure out.

 - We had an interesting issue where the .fixup and .ex_table were not
   properly updated (Ross fixed that in the tool upstream and it was backported
   in livepatch-tools). That was in Xen 4.8 timeframe.
   
https://github.com/rosslagerwall/livepatch-build-tools/commit/ae7ae0c31866f6ee2715a601fd5067d700d6084a
  

 - Replacing the livepatch with another hit a snag if both livepatches
   had the same symbol name (x86_emulate.c#_get_fpu). We came up with
   a skanky tool (symbol_rename) that just renames symbols which we use
   to always rename symbols (symbol_rename 83e0707.livepatch 
"x86_emulate.c#_get_fpu" get_fpu_83e0707)

Those are livepatch-tools issues, and not the hypervisor code (albeit
the last one could be fixed in the hypervisor by having code to deal
with global and local symbol and ignoring collision with local symbols).

With the hypervisor code, we have not had any issues - it has been
running smoothly with various types of guests.

Also I've been running (almost every night) an test of the livepatches
that are part of the Xen source.

> 
> Independent of this, the nature of what qualifies as "a correct patch"
> is subjective and very context dependent.  Consider a scenario with two
> users, the same version of the livepatch tools, an identical source
> patch, and an identical source version of Xen.  There is a very real
> possibility that these two users could get one valid and one invalid
> patch based solely on something like the compiler settings used to build
> the hypervisor they are patching.

Yes. It is imperative that the livepatch be built on the same exact
compiler as what the hypervisor was built with. Fortunatly the config.h
file exposes all of that so it is easy enough to verify that.
..snip..

> Therefore, I think it would be a mistake for us to include anything
> pertaining to "creating a livepatch, correct or otherwise" within a
> support statement.  There are many variables which we as upstream can't
> control.

It may be good to include an FAQ or such describing some of these
issues (aka an knowledge base) that detail our findings and how
we worked around them.

I will update the Wiki with it regardless of this discussion.
> 
> As for the 4th point, about what a guest can do to prevent application
> of a livepatch.
> 
> The default timeout is insufficient to quiesce Xen if a VM with a few
> VCPUs is migrating.  In this scenario, I believe p2m_lock contention is
> the underlying reason, but the point stands that there are plenty of
> things a guest can do to prevent Xen being able to suitably quiesce.
> 
> As a host administrator attempting to apply the livepatch, you get
> informed that Xen failed to quiesce and the livepatch application
> failed.  Options range from upping the timeout on the next patching
> attempt, to possibly even manually pausing the troublesome VM for a second.

Pausing a VM is aceeptable I would think.
> 
> I also think it unwise to consider any scenarios like this within the
> security statement, otherwise we will have to issue an XSA stating
> "Guests doing normal unprivileged things can cause Xen to be
> insufficient quiescent to apply livepatches with the deliberately
> conservative defaults".  What remediation would we suggest for this?

xl pause :-)

> 
> 
> On the points of unexpected access to the hypercalls, and Xen doing the
> wrong thing when presented with a legitimate correct livepatch, I think
> these are in principle fine for inclusion within a support statement.

Yes.
> 
> I would ask however how confident we are that there are no ELF parsing
> bugs in the code?  I think it might be very prudent to try and build a
> userspace harness for it and let ALF have a go.

Jan did an excellent job when reviewing the code. But there is of course
the possibility that something slipped our mind.

I would (in my opinion as livepatch maintainer) not to gate the support
part on this as well, I have no clue how to setup ALF and this may
take quite a while to get done.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Livepatching and Xen Security

2017-05-19 Thread Ian Jackson
Andrew Cooper writes ("Re: [Xen-devel] Livepatching and Xen Security"):
> livepatching doesn't use libelf.
> 
> It is a new ELF parsing implementation.

I don't think we care very much about bugs in the livepatching elf
parser.  The livepatches are all completely trusted in any case.

Furthermore, I don't think we consider the binary code or pieces of
the headers or bits of the livepatching loader tools memory map or
anything secret.  So uninitialised structure bugs just leak things we
don't care about.

Does that make sense ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Livepatching and Xen Security

2017-05-19 Thread Andrew Cooper
On 19/05/17 15:32, Wei Liu wrote:
> On Thu, May 18, 2017 at 08:07:00PM +0100, Andrew Cooper wrote:
>> I would ask however how confident we are that there are no ELF parsing
>> bugs in the code?  I think it might be very prudent to try and build a
>> userspace harness for it and let ALF have a go.
>>
> There is already a fuzzing harness in tools/fuzz for libelf.  Feel free
> to use it with either AFL or LLVM fuzzer. :-)
>
> If it doesn't work, report it and I will fix it.

livepatching doesn't use libelf.

It is a new ELF parsing implementation.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Livepatching and Xen Security

2017-05-19 Thread Wei Liu
On Thu, May 18, 2017 at 08:07:00PM +0100, Andrew Cooper wrote:
> I would ask however how confident we are that there are no ELF parsing
> bugs in the code?  I think it might be very prudent to try and build a
> userspace harness for it and let ALF have a go.
> 

There is already a fuzzing harness in tools/fuzz for libelf.  Feel free
to use it with either AFL or LLVM fuzzer. :-)

If it doesn't work, report it and I will fix it.

> ~Andrew
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Livepatching and Xen Security

2017-05-18 Thread Andrew Cooper
On 18/05/17 17:40, George Dunlap wrote:
> There are four general areas I think there may be bugs.
>
> ## Unprivileged access to Livepatching hypercalls
>
> ## Bugs in the patch creation tools which create patches with vulnerabilities
>
> ## Bugs in the patch-application code such that vulnerabilities exist
> after application
>
> ## Bugs which allow a guest to prevent the application of a livepatch
>
> # Testing status and requirements
>
> I'm told we already test that unprivileged guests cannot access the
> Livepatch hypercalls in osstest; if so, that aspect should should be
> covered.

Specifically, http://xenbits.xen.org/docs/xtf/test-livepatch-priv-check.html

(There is a docs bug I have noticed while grabbing that link, which I
have just pushed a fix for.  The live docs will be updated whenever cron
next runs.)

(I'd also like to take this opportunity to highlight an issue which
became apparent while writing that test; unstable hypercall ABIs,
wherever they reside, make this kind of testing prone to false negatives.)

> All that's needed would be for vendors to describe what kinds of
> testing they have done for Livepatching.  I think there are two
> factors which come into play:
>
> 1. Having tested live-patching thoroughly for at least some version of
> the codebase
>
> 2. Having tested live-patching for one of the Xen 4.9 RCs.
>
> Thoughts?

As a statement of what XenServer is doing:

XenServer 7.1 is based on Xen 4.7 and we are providing livepatches
(where applicable) with hypervisor hotfixes.  Thus far, XSAs 204, 207,
212-215 have been included in livepatch form, as well as a number of
other general bugfixes which were save to livepatch.

For both these hotfixes, we had to bugfix the livepatch creation tools
to generate a livepatch.  We also had to modify the XSA 213 patch to
create a livepatch.  The (pre 4.8) Xen code was buggy and used the
.fixup section when it should have used .text.unlikley, which caused the
livepatch tools to fail a cross-check of the exception frame references
when building the patch.

Thus, we are 0 for 2 on the tools being able to DTRT when given a set of
real-world fixes.

Independent of this, the nature of what qualifies as "a correct patch"
is subjective and very context dependent.  Consider a scenario with two
users, the same version of the livepatch tools, an identical source
patch, and an identical source version of Xen.  There is a very real
possibility that these two users could get one valid and one invalid
patch based solely on something like the compiler settings used to build
the hypervisor they are patching.

From an XSA point of view, we do not want to be issuing advisories
saying "If you are on OS $A, with livepatch tools $B, Hypervisor $C
compiled with these specific build options, then trying to create a
livepatch for patch $D will appear to work properly but leave a timebomb
in your hypervisor".  ISTR an issue which hit during development was
where CentOS releasing an minor update to GCC and caused chaos by
altering how the string literals got sorted.

What if a user creates a livepatch for a change which isn't remotely
safe to livepatch, uploads it, and their hypervisor goes bang?  This
would qualify under the definition of "correct" in so far as the patch
was correctly doing what it was told, and thus, fall within the security
criteria presented here.

There is already a very high user requirement in the first place to
evaluate whether patches are safe to livepatch.  This includes
interaction with other livepatches, interactions with patches in the
vendors patch queue, interaction with customer hardware, and there is no
way this can be decided automatically.

Therefore, I think it would be a mistake for us to include anything
pertaining to "creating a livepatch, correct or otherwise" within a
support statement.  There are many variables which we as upstream can't
control.

As for the 4th point, about what a guest can do to prevent application
of a livepatch.

The default timeout is insufficient to quiesce Xen if a VM with a few
VCPUs is migrating.  In this scenario, I believe p2m_lock contention is
the underlying reason, but the point stands that there are plenty of
things a guest can do to prevent Xen being able to suitably quiesce.

As a host administrator attempting to apply the livepatch, you get
informed that Xen failed to quiesce and the livepatch application
failed.  Options range from upping the timeout on the next patching
attempt, to possibly even manually pausing the troublesome VM for a second.

I also think it unwise to consider any scenarios like this within the
security statement, otherwise we will have to issue an XSA stating
"Guests doing normal unprivileged things can cause Xen to be
insufficient quiescent to apply livepatches with the deliberately
conservative defaults".  What remediation would we suggest for this?


On the points of unexpected access to the hypercalls, and Xen doing the
wrong thing when presented with a legitimate 

Re: [Xen-devel] Livepatching and Xen Security

2017-05-18 Thread Lars Kurth


On 18/05/2017 17:53, "Ian Jackson"  wrote:

>George Dunlap writes ("Livepatching and Xen Security"):
>> # Executive summary
>
>I am completely in agreement with your analysis and your conclusions.

Me too. I am not sure though whether we need a vote or lazy consensus.

For Credit2 (see 
https://lists.xenproject.org/archives/html/xen-devel/2016-11/msg00171.html)
 we used a code patch removing Experimental from KCONFIG, which implicitly
led to active confirmation by project leadership members via Acked-by
tags. In other words, we executed the equivalent of a vote.

From a process perspective
https://xenproject.org/governance.html#lazyconsensus should be sufficient
because we are applying a process, not changing one. In addition lazy
consensus decisions can only be overturned if the project leadership
agrees collectively to do so, because the decision is too important for
lazy consensus. As every leadership member is on the list, there would be
ample opportunity to raise objections.

My gut feeling though is that Leadership Team members and Security Team
members should probably actively agree/disagree to proposals related to
whether a feature is supported and thus security supported to avoid any
future issues. George already expressed an implicit +1 (by making the
proposal) and Ian an explicit +1.

Regards
Lars

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Livepatching and Xen Security

2017-05-18 Thread Ian Jackson
George Dunlap writes ("Livepatching and Xen Security"):
> # Executive summary

I am completely in agreement with your analysis and your conclusions.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] Livepatching and Xen Security

2017-05-18 Thread George Dunlap
# Executive summary

* It is important for the Livepatching feature to be declared
"security supported".

* At the moment it's not clear whether we can get osstest support in
by the release or not, or if we did whether that would be considered
sufficient to consider it supported

* I would argue that if various vendors can demonstrate that they have
tested the key functionality of the feature in one of the 4.9 RCs at
least as well as osstest would to, that should be sufficient to
declare it security supported for 4.9.

# Why is this important?

Having a feature "security supported" does two things.

First of all, it indicates to users which parts of Xen are considered
safe to use in environments where security is important.

One of the key selling points of Xen is its security compared to other
hypervisors.  But there is a lot of functionality which *can* be
enabled in Xen that isn't really secure: for instance, there are
hundreds of devices emulated by qemu that you *can* enable, but for
which it would not be really safe in the face of a determined
attacker.  Statements of security support help indicate to users which
subset of functionality they should use.

Secondly, it indicates whether a bug discovered in the feature will go
through the XenProject Security Response Process -- i.e., whether it
will be issued an XSA.

Issuing an XSA does two things.  First, it allows people on the
predisclosure list to update their systems before issues are made
known to the entire world.  Secondly, it gives people *not* on the
predisclosure list a structured way of finding out about important
security issues about which they may need to take some action.

# Potential Livepatch security issues

To think about what "security support for Livepatching" means, we have
to have to answer the question, "If there were a security-related bug
in Livepatching, what would it look like?"  This helps us think about
what kind of promises we would give, and what kinds of testing we
would consider necessary to provide security support.

In each case, we must ask: Is this the kind of bug that software
vendors and cloud providers would want embargoed notice for?  Is this
the kind of bug that members not on the pre-disclosure list would like
to be notified of so they can react to quickly?

There are four general areas I think there may be bugs.

## Unprivileged access to Livepatching hypercalls

First, and most obvious, would be a bug that allowed unprivileged
guests access to the livepatching hypercalls.  Livepatching's explicit
purpose is to make it as easy as possible to execute arbitrary code
inside the hypervisor, so any bug which gave an unprivileged guest
access would obviously be a major security issue.

## Bugs in the patch creation tools which create patches with vulnerabilities

Suppose that we have hypervisor version A, which has a security
vulnerability, and a software patch which generates hypervisor version
B, which, when built and booted as a normal binary will not have the
vulnerability.

The livepatch creation tools are designed to help developers create a
binary blob, such that when uploaded into a running copy of hypervisor
A, will cause it to behave like hypervisor B.

But suppose there was a bug in the tools, such that the blob would
successfully upload, but leave open vulnerabilities -- either by not
correctly fixing the original vulnerability, or by opening yet a
different vulnerability.

## Bugs in the patch-application code such that vulnerabilities exist
after application

Suppose instead that we have a binary patch which is correct -- i.e.,
if it were applied as designed it would close the vulnerability and
not open a new one.  But suppose that there was a bug in the patch
application code such that, after the patch process reported success,
known vulnerabilities existed.

As a simple example, suppose that the patch was simply not applied,
but that it was reported as applied anyway.

Or suppose that the patch was applied partially, but not completely.
For instance, say there was an off-by-one error that caused only 2 of
3 required functions to be patched.

Or again, suppose there was a bug in the patch application code such
that, as a side effect of applying the patch and closing the original
vulnerability, a secondary vulnerability was opened up -- say, an
accidental modification to a hypercall table or IDT entries.

## Bugs which allow a guest to prevent the application of a livepatch

In order to apply a patch, the livepatching code first makes several
checks, then pauses the entire system while it finishes the change.
To pause the system, it sends a message to all the other processors
telling them to stop what they're doing, and it waits for them to say
they're ready.

You could imagine a bug in either of these two which might be
triggerably by the guest.  For instance, there could be a bug in the
code which checks whether the livepatch can be applied, such that the
guest can make the check fail.  Or, there