On Wed, 14 Aug 2019, Julien Grall wrote:
> Hi Oleksandr,
> 
> On 13/08/2019 13:35, Oleksandr wrote:
> > 
> > On 12.08.19 22:46, Julien Grall wrote:
> > > Hi Oleksandr,
> > 
> > Hi, Julien
> > 
> > 
> > > 
> > > On 8/12/19 1:01 PM, Oleksandr wrote:
> > > > On 12.08.19 14:11, Julien Grall wrote:
> > > > > On 02/08/2019 17:39, Oleksandr Tyshchenko wrote:
> > > > > > From: Oleksandr Tyshchenko <oleksandr_tyshche...@epam.com>
> > > > > > 
> > > > > > This patch adds minimal required support to General IOMMU framework
> > > > > > to be able to handle a case when IOMMU driver requesting deferred
> > > > > > probing for a device.
> > > > > > 
> > > > > > In order not to pull Linux's error code (-EPROBE_DEFER) to Xen
> > > > > > we have chosen -EAGAIN to be used for indicating that device
> > > > > > probing is deferred.
> > > > > > 
> > > > > > This is needed for the upcoming IPMMU driver which may request
> > > > > > deferred probing depending on what device will be probed the first
> > > > > > (there is some dependency between these devices, Root device must be
> > > > > > registered before Cache devices. If not the case, driver will deny
> > > > > > further Cache device probes until Root device is registered).
> > > > > > As we can't guarantee a fixed pre-defined order for the device nodes
> > > > > > in DT, we need to be ready for the situation where devices being
> > > > > > probed in "any" order.
> > > > > > 
> > > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshche...@epam.com>
> > > > > > ---
> > > > > >   xen/common/device_tree.c            |  1 +
> > > > > >   xen/drivers/passthrough/arm/iommu.c | 35
> > > > > > ++++++++++++++++++++++++++++++++++-
> > > > > >   xen/include/asm-arm/device.h        |  6 +++++-
> > > > > >   xen/include/xen/device_tree.h       |  1 +
> > > > > >   4 files changed, 41 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
> > > > > > index e107c6f..6f37448 100644
> > > > > > --- a/xen/common/device_tree.c
> > > > > > +++ b/xen/common/device_tree.c
> > > > > > @@ -1774,6 +1774,7 @@ static unsigned long __init
> > > > > > unflatten_dt_node(const void *fdt,
> > > > > >           /* By default the device is not protected */
> > > > > >           np->is_protected = false;
> > > > > >           INIT_LIST_HEAD(&np->domain_list);
> > > > > > +        INIT_LIST_HEAD(&np->deferred_probe);
> > > > > 
> > > > > I am not entirely happy to add a new list_head field per node just for
> > > > > the benefits of boot code. Could we re-use domain_list (with a comment
> > > > > in the code and appropriate ASSERT)?
> > > > 
> > > > Agree that only boot code uses deferred_probe field. I will consider
> > > > re-using domain_list. Could you please clarify regarding ASSERT (where
> > > > to put and what to check).
> > > 
> > > What I meant is adding an ASSERT to check that np->domain_list is at empty
> > > at least before trying to add in the list. This would help to debug any
> > > potential issue if we end up to use domain_list earlier in the future. I
> > > can't see why it would as iommu is called earlier, but who knows :).
> > 
> > Got it. Thank you for clarification.
> > 
> > 
> > > > > > +
> > > > > >   static const struct iommu_ops *iommu_ops;
> > > > > >     const struct iommu_ops *iommu_get_ops(void)
> > > > > > @@ -42,7 +48,7 @@ void __init iommu_set_ops(const struct iommu_ops
> > > > > > *ops)
> > > > > >     int __init iommu_hardware_setup(void)
> > > > > >   {
> > > > > > -    struct dt_device_node *np;
> > > > > > +    struct dt_device_node *np, *tmp;
> > > > > >       int rc;
> > > > > >       unsigned int num_iommus = 0;
> > > > > >   @@ -51,6 +57,33 @@ int __init iommu_hardware_setup(void)
> > > > > >           rc = device_init(np, DEVICE_IOMMU, NULL);
> > > > > >           if ( !rc )
> > > > > >               num_iommus++;
> > > > > > +        else if (rc == -EAGAIN)
> > > > > > +            /*
> > > > > > +             * Driver requested deferred probing, so add this
> > > > > > device to
> > > > > > +             * the deferred list for further processing.
> > > > > > +             */
> > > > > > +            list_add(&np->deferred_probe, &deferred_probe_list);
> > > > > > +    }
> > > > > > +
> > > > > > +    /*
> > > > > > +     * Process devices in the deferred list if at least one
> > > > > > successfully
> > > > > > +     * probed device is present.
> > > > > > +     */
> > > > > 
> > > > > I think this can turn into an infinite loop if all device in
> > > > > deferred_probe_list still return -EDEFER_PROBE and num_iommus is a
> > > > > non-zero.
> > > > 
> > > > Agree.
> > > > 
> > > > 
> > > > > 
> > > > > A better condition would be to check that at least one IOMMU is added
> > > > > at each loop. If not, then we should bail with an error because it
> > > > > likely means something is buggy.
> > > > 
> > > > Sounds reasonable. Will do.
> > > > 
> > > > 
> > > > Just to clarify:
> > > > 
> > > >  >>> A better condition would be to check that at least one IOMMU is
> > > > added at each loop.
> > > > 
> > > > Maybe, not only added (rc == 0), but driver didn't request deferred
> > > > probe (rc != -EAGAIN).
> > > 
> > > I think adding an IOMMU is enough. If you return an error other than
> > > -EAGAIN here after deferring probing, then you are likely going to fail at
> > > the next loop. So better to stop early.
> > 
> > It makes sense.
> > 
> > 
> > > 
> > > 
> > > I realize this not what the current code is doing (I know I wrote it ;)).
> > > But I am not sure it is sane to continue if only part of the IOMMUs are
> > > initialized. Most likely you will see an error much later that may be not
> > > trivial to find out.
> > > 
> > > Imagine you want to passthrough you network card to a guest but the IOMMU
> > > initialization failed...
> > 
> > Oh, agree.
> > 
> > As I understand, the new strict logic would be the following:
> > 
> > If initialization for at least one IOMMU device failed (device_init returns
> > an error other than -EAGAIN), we should stop and return an error to upper
> > layer (even if num_iommus > 0). No matter whether it is during the first
> > attempt or after deferring probe. We don't allow the "I/O virtualisation" to
> > be enabled (iommu_enabled == true) with only part of the IOMMU devices being
> > initialized. Is my understanding correct?
> 
> Let me summarize the discussion we had on IRC :). Without your patch, Xen may
> initialize only half the IOMMUs. If the device is behind an IOMMU that wasn't
> initialized, then we have two possibility:
>    1) The device was already mark as protected (if using the old binding in
> the SMMU). Xen will not be able to assign the device to Dom0 and therefore Xen
> will crash (not able to build dom0). For domU, it will depend whether the
> configuration contain the options 'dtdev'. If the option is specified, then
> guest will fail to build. On the contrary if the option isn't specified then
> the guest will boot and this could either lead to transaction failure (if the
> IOMMU was already reset) or bypassing the IOMMU. Note that the latter can
> today happen if your IOMMU was disabled. But we can't do much against it.
>    2) The device is not marked as protected. Xen will not be able to "assign"
> the device to Dom0 and this could either lead to the device bypassing the
> IOMMU or a transaction failure. For domU, the problem is similar to 1).
> 
> In the case of the SMMU driver, we only support old bindings. So devices are
> marked as protected during SMMU initialization. This is done before the SMMU
> is reset. Before reset the SMMU will bypassed.
> 
> So the risk is to have an half secure system and may be unnoticed until later.
> I realize this is the current behavior, so not very ideal.
> 
> It feels to me if the user requested to use IOMMU then if we should panic if
> any of the available IOMMU are not initialized correctly. This will save a lot
> of debug afterwards.
> 
> @Stefano, any opinions?

I agree that we should enable all IOMMUs or none. I don't think we want
to deal with partially initialized IOMMUs systems.

Failure to enable all IOMMUs should lead to returning an error from the
relevant function (arch_iommu_populate_page_table?) which should
translate into Xen failing to boot and crashing. Which I think it is
what you are suggesting, right?

(I wouldn't call panic() inside the IOMMU specific initializer, because
there might be iommu= command line options for Xen that specify a
different desired outcome. I would deal with this case the same way we
would deal with an error during initialization of a single IOMMU.)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to