On 1/23/20 5:27 AM, Daniel P. Berrangé wrote:
> On Wed, Jan 22, 2020 at 05:42:10PM -0500, John Snow wrote:
>>
>>
>> On 12/24/19 8:00 AM, Daniel P. Berrangé wrote:
>>> Based on experiance in libvirt, this is an even larger job than (4),
>>> as the feature set here is huge.  Much of it directly ties into the
>>> config problem, as to deal with SELinux / namespace setup the code
>>> needs to understand what resources to provide access to. This
>>> requires a way to express 100% coverage of all QEMU configuration
>>> in use & analyse it to determine what resources it implies. So this
>>> ties strongly into QAPI-ification completion.
>>
>> Is it totally bonkers to suggest that QEMU provide a method of digesting
>> a given configuration and returning a configuration object that a
>> standalone jailer can use?
>>
>> So we have a QEMU manager, the generic jailer, and QEMU. QEMU and the
>> manager cooperate to produce the jailing configuration, and the jailer
>> does what we ask it to.
> 
> It isn't clear what you mean by "QEMU" here. If this QEMU, the system
> emulator process, then this is the untrustworthy part of the stack,
> so the jailer must not use any data that QEMU is providing. In fact
> during startup the jailer does its work before QEMU even exists.
> 

I worried about this. Hence the "Nuts?" ask. It sounds like the ultimate
problem is nobody can know -- except QEMU -- what permissions are truly
needed for a given configuration. Even if we had an immaculate API, how
would anyone except QEMU developers know?

Trial and error, perhaps, on behalf of the jailer developers. Trial and
error is not the greatest feature of a security mechanism. Clearly, a
lot of effort has been spent to get libvirt's implementation correct,
but Stefan raises the idea that other projects have need of
understanding how to map QEMU configurations to appropriate jails.

Worse, it could still change on a whim. We (QEMU developers) probably
are not used to thinking of permitted syscall lists as ABI that we
strive to maintain. It can change.

How do we make this easier in a way that doesn't trust QEMU? I feel like
QEMU needs to provide *some* kind of information that can be used to
build better jailing configurations...

> There are aspects to the confinement that use / rely on knowledge that
> QEMU doesn't normally have, or are expressed in a different way that
> which QEMU uses, or needs to take a different imlpementation approach to
> that which QEMU normally has.
> 
> For networking, for example, from QEMU's config POV, there's just a
> TAP file descriptor. There are then a huge number of ways in which
> that TAP FD has been connected to the network in the host that are
> invisible to QEMU. Plain bridge, openvswitch bridge, macvtap device
> all with varying configs. Knowledge of this is relevant to the manager
> process and the jailer but irrelevant to QEMU.
> 
> When configuring disks we have technical issues. For example we need
> to identify the full backing chain and grant the appropriate permissions
> on this. Even if there was a libqemublock.so, libvirt would not use this
> because the QEMU storage code design is not reliable & minimal enough.
> For example to just query the backing file, QEMU opens the qcow2 and
> parses all the data about it, building up L1/L2 tables, and other
> data structures involved. It is trivial to create qcow2 files which
> result in both memory and CPU denial of service merely from opening
> the file.  Libvirt's approach to this is minimalist just having a
> data table of offsets to the key fields in each file format. So we
> can extract the backing file & its format without reading anything
> else from the disk.
> 
> When configuring chardevs there is a choice of how to do it - we
> could just pass the UNIX socket path in, or we could create the
> UNIX socket ourselves & pass in the pre-opened FD. Both are equally
> functional from QEMU's POV and the end user's POV, but passing a
> pre-opened FD is more convenient for libvirt's needs as it allowed
> for race-free startups sychronization between libvirt & QEMU, or
> rather QMP.  The different options here though, have different
> needs on the jailer, because extra steps are needed when passing
> pre-opened FD to get the SELinux labelling right. QEMU doesn't
> know which approach the mgmt app will want to take, so we can't
> ask QEMU how the jailer should be configured - the mgmt app needs
> to make that decision.
> 
> Essentially we have 2 configuration formats - the high level one
> that the mgmt app layer uses & the low level one that QEMU uses.
> The component in the stack which maps between the two config
> formats, is that one that has the knowledge to configure the
> jailer. This isn't QEMU. It is whatever is immediately above QEMU,
> currently libvirt, but something conceptually equivalent to the
> role libvirt's QEMU driver impl fills.
> 
> Regards,
> Daniel
> 

-- 
—js


Reply via email to