Re: [Xen-devel] Livepatching and Xen Security
> > 1. Having tested live-patching thoroughly for at least some version of > > the codebase > > > > 2. Having tested live-patching for one of the Xen 4.9 RCs. > > > > Thoughts? > > As a statement of what XenServer is doing: As a statement of what Oracle is doing. We have been using livepatching for a year or so. It is a bit older (thanks to Xend dependency, ) so not as fresh as Xen 4.9. We had quite a few of livepatches, including some that are not XSAs, with success. We did run in some issues: - We compiled Xen with '--maxcpus=384' and the livepatch tools didn't include that, which made all the GCC compiled local variable names different. Once we got the --maxcpus=XYZ to match it all was good. But that took a while to figure out. - We had an interesting issue where the .fixup and .ex_table were not properly updated (Ross fixed that in the tool upstream and it was backported in livepatch-tools). That was in Xen 4.8 timeframe. https://github.com/rosslagerwall/livepatch-build-tools/commit/ae7ae0c31866f6ee2715a601fd5067d700d6084a - Replacing the livepatch with another hit a snag if both livepatches had the same symbol name (x86_emulate.c#_get_fpu). We came up with a skanky tool (symbol_rename) that just renames symbols which we use to always rename symbols (symbol_rename 83e0707.livepatch "x86_emulate.c#_get_fpu" get_fpu_83e0707) Those are livepatch-tools issues, and not the hypervisor code (albeit the last one could be fixed in the hypervisor by having code to deal with global and local symbol and ignoring collision with local symbols). With the hypervisor code, we have not had any issues - it has been running smoothly with various types of guests. Also I've been running (almost every night) an test of the livepatches that are part of the Xen source. > > Independent of this, the nature of what qualifies as "a correct patch" > is subjective and very context dependent. Consider a scenario with two > users, the same version of the livepatch tools, an identical source > patch, and an identical source version of Xen. There is a very real > possibility that these two users could get one valid and one invalid > patch based solely on something like the compiler settings used to build > the hypervisor they are patching. Yes. It is imperative that the livepatch be built on the same exact compiler as what the hypervisor was built with. Fortunatly the config.h file exposes all of that so it is easy enough to verify that. ..snip.. > Therefore, I think it would be a mistake for us to include anything > pertaining to "creating a livepatch, correct or otherwise" within a > support statement. There are many variables which we as upstream can't > control. It may be good to include an FAQ or such describing some of these issues (aka an knowledge base) that detail our findings and how we worked around them. I will update the Wiki with it regardless of this discussion. > > As for the 4th point, about what a guest can do to prevent application > of a livepatch. > > The default timeout is insufficient to quiesce Xen if a VM with a few > VCPUs is migrating. In this scenario, I believe p2m_lock contention is > the underlying reason, but the point stands that there are plenty of > things a guest can do to prevent Xen being able to suitably quiesce. > > As a host administrator attempting to apply the livepatch, you get > informed that Xen failed to quiesce and the livepatch application > failed. Options range from upping the timeout on the next patching > attempt, to possibly even manually pausing the troublesome VM for a second. Pausing a VM is aceeptable I would think. > > I also think it unwise to consider any scenarios like this within the > security statement, otherwise we will have to issue an XSA stating > "Guests doing normal unprivileged things can cause Xen to be > insufficient quiescent to apply livepatches with the deliberately > conservative defaults". What remediation would we suggest for this? xl pause :-) > > > On the points of unexpected access to the hypercalls, and Xen doing the > wrong thing when presented with a legitimate correct livepatch, I think > these are in principle fine for inclusion within a support statement. Yes. > > I would ask however how confident we are that there are no ELF parsing > bugs in the code? I think it might be very prudent to try and build a > userspace harness for it and let ALF have a go. Jan did an excellent job when reviewing the code. But there is of course the possibility that something slipped our mind. I would (in my opinion as livepatch maintainer) not to gate the support part on this as well, I have no clue how to setup ALF and this may take quite a while to get done. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Livepatching and Xen Security
Andrew Cooper writes ("Re: [Xen-devel] Livepatching and Xen Security"): > livepatching doesn't use libelf. > > It is a new ELF parsing implementation. I don't think we care very much about bugs in the livepatching elf parser. The livepatches are all completely trusted in any case. Furthermore, I don't think we consider the binary code or pieces of the headers or bits of the livepatching loader tools memory map or anything secret. So uninitialised structure bugs just leak things we don't care about. Does that make sense ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Livepatching and Xen Security
On 19/05/17 15:32, Wei Liu wrote: > On Thu, May 18, 2017 at 08:07:00PM +0100, Andrew Cooper wrote: >> I would ask however how confident we are that there are no ELF parsing >> bugs in the code? I think it might be very prudent to try and build a >> userspace harness for it and let ALF have a go. >> > There is already a fuzzing harness in tools/fuzz for libelf. Feel free > to use it with either AFL or LLVM fuzzer. :-) > > If it doesn't work, report it and I will fix it. livepatching doesn't use libelf. It is a new ELF parsing implementation. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Livepatching and Xen Security
On Thu, May 18, 2017 at 08:07:00PM +0100, Andrew Cooper wrote: > I would ask however how confident we are that there are no ELF parsing > bugs in the code? I think it might be very prudent to try and build a > userspace harness for it and let ALF have a go. > There is already a fuzzing harness in tools/fuzz for libelf. Feel free to use it with either AFL or LLVM fuzzer. :-) If it doesn't work, report it and I will fix it. > ~Andrew > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Livepatching and Xen Security
On 18/05/17 17:40, George Dunlap wrote: > There are four general areas I think there may be bugs. > > ## Unprivileged access to Livepatching hypercalls > > ## Bugs in the patch creation tools which create patches with vulnerabilities > > ## Bugs in the patch-application code such that vulnerabilities exist > after application > > ## Bugs which allow a guest to prevent the application of a livepatch > > # Testing status and requirements > > I'm told we already test that unprivileged guests cannot access the > Livepatch hypercalls in osstest; if so, that aspect should should be > covered. Specifically, http://xenbits.xen.org/docs/xtf/test-livepatch-priv-check.html (There is a docs bug I have noticed while grabbing that link, which I have just pushed a fix for. The live docs will be updated whenever cron next runs.) (I'd also like to take this opportunity to highlight an issue which became apparent while writing that test; unstable hypercall ABIs, wherever they reside, make this kind of testing prone to false negatives.) > All that's needed would be for vendors to describe what kinds of > testing they have done for Livepatching. I think there are two > factors which come into play: > > 1. Having tested live-patching thoroughly for at least some version of > the codebase > > 2. Having tested live-patching for one of the Xen 4.9 RCs. > > Thoughts? As a statement of what XenServer is doing: XenServer 7.1 is based on Xen 4.7 and we are providing livepatches (where applicable) with hypervisor hotfixes. Thus far, XSAs 204, 207, 212-215 have been included in livepatch form, as well as a number of other general bugfixes which were save to livepatch. For both these hotfixes, we had to bugfix the livepatch creation tools to generate a livepatch. We also had to modify the XSA 213 patch to create a livepatch. The (pre 4.8) Xen code was buggy and used the .fixup section when it should have used .text.unlikley, which caused the livepatch tools to fail a cross-check of the exception frame references when building the patch. Thus, we are 0 for 2 on the tools being able to DTRT when given a set of real-world fixes. Independent of this, the nature of what qualifies as "a correct patch" is subjective and very context dependent. Consider a scenario with two users, the same version of the livepatch tools, an identical source patch, and an identical source version of Xen. There is a very real possibility that these two users could get one valid and one invalid patch based solely on something like the compiler settings used to build the hypervisor they are patching. From an XSA point of view, we do not want to be issuing advisories saying "If you are on OS $A, with livepatch tools $B, Hypervisor $C compiled with these specific build options, then trying to create a livepatch for patch $D will appear to work properly but leave a timebomb in your hypervisor". ISTR an issue which hit during development was where CentOS releasing an minor update to GCC and caused chaos by altering how the string literals got sorted. What if a user creates a livepatch for a change which isn't remotely safe to livepatch, uploads it, and their hypervisor goes bang? This would qualify under the definition of "correct" in so far as the patch was correctly doing what it was told, and thus, fall within the security criteria presented here. There is already a very high user requirement in the first place to evaluate whether patches are safe to livepatch. This includes interaction with other livepatches, interactions with patches in the vendors patch queue, interaction with customer hardware, and there is no way this can be decided automatically. Therefore, I think it would be a mistake for us to include anything pertaining to "creating a livepatch, correct or otherwise" within a support statement. There are many variables which we as upstream can't control. As for the 4th point, about what a guest can do to prevent application of a livepatch. The default timeout is insufficient to quiesce Xen if a VM with a few VCPUs is migrating. In this scenario, I believe p2m_lock contention is the underlying reason, but the point stands that there are plenty of things a guest can do to prevent Xen being able to suitably quiesce. As a host administrator attempting to apply the livepatch, you get informed that Xen failed to quiesce and the livepatch application failed. Options range from upping the timeout on the next patching attempt, to possibly even manually pausing the troublesome VM for a second. I also think it unwise to consider any scenarios like this within the security statement, otherwise we will have to issue an XSA stating "Guests doing normal unprivileged things can cause Xen to be insufficient quiescent to apply livepatches with the deliberately conservative defaults". What remediation would we suggest for this? On the points of unexpected access to the hypercalls, and Xen doing the wrong thing when presented with a legitimate
Re: [Xen-devel] Livepatching and Xen Security
On 18/05/2017 17:53, "Ian Jackson"wrote: >George Dunlap writes ("Livepatching and Xen Security"): >> # Executive summary > >I am completely in agreement with your analysis and your conclusions. Me too. I am not sure though whether we need a vote or lazy consensus. For Credit2 (see https://lists.xenproject.org/archives/html/xen-devel/2016-11/msg00171.html) we used a code patch removing Experimental from KCONFIG, which implicitly led to active confirmation by project leadership members via Acked-by tags. In other words, we executed the equivalent of a vote. From a process perspective https://xenproject.org/governance.html#lazyconsensus should be sufficient because we are applying a process, not changing one. In addition lazy consensus decisions can only be overturned if the project leadership agrees collectively to do so, because the decision is too important for lazy consensus. As every leadership member is on the list, there would be ample opportunity to raise objections. My gut feeling though is that Leadership Team members and Security Team members should probably actively agree/disagree to proposals related to whether a feature is supported and thus security supported to avoid any future issues. George already expressed an implicit +1 (by making the proposal) and Ian an explicit +1. Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Livepatching and Xen Security
George Dunlap writes ("Livepatching and Xen Security"): > # Executive summary I am completely in agreement with your analysis and your conclusions. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] Livepatching and Xen Security
# Executive summary * It is important for the Livepatching feature to be declared "security supported". * At the moment it's not clear whether we can get osstest support in by the release or not, or if we did whether that would be considered sufficient to consider it supported * I would argue that if various vendors can demonstrate that they have tested the key functionality of the feature in one of the 4.9 RCs at least as well as osstest would to, that should be sufficient to declare it security supported for 4.9. # Why is this important? Having a feature "security supported" does two things. First of all, it indicates to users which parts of Xen are considered safe to use in environments where security is important. One of the key selling points of Xen is its security compared to other hypervisors. But there is a lot of functionality which *can* be enabled in Xen that isn't really secure: for instance, there are hundreds of devices emulated by qemu that you *can* enable, but for which it would not be really safe in the face of a determined attacker. Statements of security support help indicate to users which subset of functionality they should use. Secondly, it indicates whether a bug discovered in the feature will go through the XenProject Security Response Process -- i.e., whether it will be issued an XSA. Issuing an XSA does two things. First, it allows people on the predisclosure list to update their systems before issues are made known to the entire world. Secondly, it gives people *not* on the predisclosure list a structured way of finding out about important security issues about which they may need to take some action. # Potential Livepatch security issues To think about what "security support for Livepatching" means, we have to have to answer the question, "If there were a security-related bug in Livepatching, what would it look like?" This helps us think about what kind of promises we would give, and what kinds of testing we would consider necessary to provide security support. In each case, we must ask: Is this the kind of bug that software vendors and cloud providers would want embargoed notice for? Is this the kind of bug that members not on the pre-disclosure list would like to be notified of so they can react to quickly? There are four general areas I think there may be bugs. ## Unprivileged access to Livepatching hypercalls First, and most obvious, would be a bug that allowed unprivileged guests access to the livepatching hypercalls. Livepatching's explicit purpose is to make it as easy as possible to execute arbitrary code inside the hypervisor, so any bug which gave an unprivileged guest access would obviously be a major security issue. ## Bugs in the patch creation tools which create patches with vulnerabilities Suppose that we have hypervisor version A, which has a security vulnerability, and a software patch which generates hypervisor version B, which, when built and booted as a normal binary will not have the vulnerability. The livepatch creation tools are designed to help developers create a binary blob, such that when uploaded into a running copy of hypervisor A, will cause it to behave like hypervisor B. But suppose there was a bug in the tools, such that the blob would successfully upload, but leave open vulnerabilities -- either by not correctly fixing the original vulnerability, or by opening yet a different vulnerability. ## Bugs in the patch-application code such that vulnerabilities exist after application Suppose instead that we have a binary patch which is correct -- i.e., if it were applied as designed it would close the vulnerability and not open a new one. But suppose that there was a bug in the patch application code such that, after the patch process reported success, known vulnerabilities existed. As a simple example, suppose that the patch was simply not applied, but that it was reported as applied anyway. Or suppose that the patch was applied partially, but not completely. For instance, say there was an off-by-one error that caused only 2 of 3 required functions to be patched. Or again, suppose there was a bug in the patch application code such that, as a side effect of applying the patch and closing the original vulnerability, a secondary vulnerability was opened up -- say, an accidental modification to a hypercall table or IDT entries. ## Bugs which allow a guest to prevent the application of a livepatch In order to apply a patch, the livepatching code first makes several checks, then pauses the entire system while it finishes the change. To pause the system, it sends a message to all the other processors telling them to stop what they're doing, and it waits for them to say they're ready. You could imagine a bug in either of these two which might be triggerably by the guest. For instance, there could be a bug in the code which checks whether the livepatch can be applied, such that the guest can make the check fail. Or, there