Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters

Paul Moore Mon, 05 Nov 2012 13:58:39 -0800

On Monday, November 05, 2012 09:39:46 AM Corey Bryant wrote:
> On 11/02/2012 06:14 PM, Paul Moore wrote:
> > On Friday, November 02, 2012 06:00:29 PM Corey Bryant wrote:
> >> On 11/02/2012 05:29 PM, Paul Moore wrote:
> >>> On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote:
> >>>> This patch includes a second whitelist right before the main loop. It's
> >>>> a smaller and more restricted whitelist, excluding execve() among many
> >>>> others.
> >>>> 
> >>>> v2: * ctx changed to main_loop_ctx
> >>>> 
> >>>>       * seccomp_on now inside ifdef
> >>>>       * open syscall added to the main_loop whitelist
> >>>> 
> >>>> Signed-off-by: Eduardo Otubo <ot...@linux.vnet.ibm.com>
> >>> 
> >>> Unfortunately qemu.org seems to be down for me today so I can't grab the
> >>> latest repo to review/verify this patch (some of my comments/assumptions
> >>> below may be off) but I'm a little confused, hopefully you guys can help
> >>> me out, read below ...
> >>> 
> >>> The first call to seccomp_install_filter() will setup a whitelist for
> >>> the
> >>> syscalls that have been explicitly specified, all others will hit the
> >>> default action TRAP/KILL.  The second call to seccomp_install_filter()
> >>> will add a second whitelist for another set of explicitly specified
> >>> syscalls, all others will hit the default action TRAP/KILL.
> >> 
> >> That's correct.  The goal was to have a 2nd list that is a subset of the
> >> 1st list, and also not include execve() in the 2nd list.  At this point
> >> though, since it's late in the release, we've expanded the 2nd list to
> >> be the same as the 1st with the exception of execve() not being in the
> >> 2nd list.
> >> 
> >>> The problem occurs when the filters are executed in the kernel when a
> >>> syscall is executed.  On each syscall the first filter will be executed
> >>> and the action will either be ALLOW or TRAP/KILL, next the second filter
> >>> will be executed and the action will either be ALLOW or TRAP/KILL; since
> >>> the kernel always takes the most restrictive (lowest integer action
> >>> value) action when multiple filters are specified, I think your double
> >>> whitelist value is going to have some inherent problems.
> >> 
> >> That's something I hadn't thought of.  But TRAP and KILL won't exist
> >> together in our whitelists, and our 2nd whitelist is a subset of the
> >> 1st.  So do you think there would still be problems?
> > 
> > It doesn't really matter if the default action is TRAP and/or KILL, the
> > point is that if you use a second whitelist after an initial whitelist
> > the effective seccomp filter is going to be only the syscalls you
> > explicitly allowed in the second whitelist.  When using multiple seccomp
> > filters on a process, all filters are executed for each syscall and the
> > most restrictive action of all the filters is the action that the kernel
> > takes.
> > 
> > Don't get me wrong, I like the idea of progressively restricting QEMU, but
> > if you are going to load multiple seccomp filters into the kernel, you
> > almost certainly only want the first whitelist filter to be the union of
> > all the seccomp filter you intend to load with all subsequent filters
> > being blacklists which progressively remove syscalls which are allowed by
> > the initial whitelist.
> 
> That's what we're doing though.  The first whitelist is a union of all
> subsequent filters.  Of course there's only one subsequent filter at
> this point.  But the idea is to start out with a large whitelist for
> initialization and then tighten it up before the main loop when
> presumably less syscalls are needed.


Okay, that's good ... It still seems a bit odd to me, I think a whitelist 1st 
blacklist 2nd is a more intuitive and efficient solution but that may just be 
me.

> My concern is getting the two whitelists correct.  We keep uncovering
> new syscalls as we test.

Of course, this whole whitelist/blacklist discussion assumes the list of 
allowed syscalls is correct.

> >>> I might suggest an initial, fairly permissive
> >>> whitelist followed by a follow-on blacklist if you want to disable
> >>> certain
> >>> syscalls.
> >> 
> >> I have to admit I'm nervous about this at this point in QEMU 1.3.  It's
> >> getting late in the cycle and we'd hoped to get this in earlier.  A more
> >> permissive whitelist is probably going to be the only way we'll
> >> successfully turn -sandbox on by default at this point in QEMU 1.3.
> > 
> > Thats fine, I just wanted to point out that I think the multiple whitelist
> > approach is going to have some inherent problems.
> 
> Are you thinking there will be problems with the current two-whitelist
> approach, or are you thinking there would be problems in the future if
> we continued restricting the QEMU process with further whitelists?  If
> you mean the latter, then I understand your point since QEMU is a single
> process that requires a certain subset of syscalls.

I was originally concerned that you were structuring the whitelists 
incorrectly, but it sounds like that is not the case - that's good.

I'm still concerned that the double whitelist approach may result in bigger 
syscall filters than necessary but until we get a final-ish list there is no 
point worrying about that.

> I'm thinking once the two whitelists are in place, we can move on to
> restricting syscall parameters in the existing whitelists where it makes
> sense ...

Yep, sounds reasonable.

> and then look into your original decomposition approach, where
> parts of qemu are run in separate threads/processes which would allow
> much tighter seccomp restriction.

Ultimately I think this is the right solution if we want to get serious about 
making QEMU more resistant to attacks from malicious guests.

-- 
paul moore
security and virtualization @ redhat

Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters

Reply via email to