Hi Julien,

Thanks for the feedback. I'm answering the straightforward issues here
and saving the rest for the emerging thread.

On Mon, Dec 4, 2023 at 8:24 PM Julien Grall <jul...@xen.org> wrote:
>
> Hi Jens,
>
> On 04/12/2023 07:55, Jens Wiklander wrote:
> > When an FF-A enabled guest is destroyed it may leave behind memory
> > shared with SPs. This memory must be reclaimed before it's reused or an
> > SP may make changes to memory used by a new unrelated guest. So when the
> > domain is teared down add FF-A requests to reclaim all remaining shared
> > memory.
> >
> > SPs in the secure world are notified using VM_DESTROYED that a guest has
> > been destroyed. An SP is supposed to relinquish all shared memory to allow
> > reclaiming the memory. The relinquish operation may need to be delayed if
> > the shared memory is for instance part of a DMA operation.
> >
> > If the FF-A memory reclaim request fails, return -ERESTART to retry
> > again. This will effectively block the destruction of the guest until
> > all memory has been reclaimed.
> >
> > Signed-off-by: Jens Wiklander <jens.wiklan...@linaro.org>
> > ---
> > Hi,
> >
> > This patch is a bit crude, but gets the job done. In a well designed
> > system this might even be good enough since the SP or the secure world
> > will let the memory be reclaimed and we can move on. But, if for some
> > reason reclaiming the memory is refused it must not be possible to reuse
> > the memory.
>
> IIUC, we are trying to harden against a buggy SP. Is that correct?
>
> >
> > These shared memory ranges are typically quite small compared to the
> > total memory usage of a guest so it would be an improvement if only
> > refused shared memory ranges where set aside from future reuse while the
> > guest was destroyed and other resources made available for reuse. This
> > could be done by for instance assign the refused shared memory ranges
> > to a dummy VM like DOMID_IO.
>
> I like the idea to use a dummy VM, but I don't think DOMID_IO is right.
> Once teardown has completed, the domain will stay around until the last
> reference on all pages are dropped. At this point, the amount of memory
> left-over is minimum (this is mostly bookeeping in Xen).
>
>  From the userland PoV, the domain will still show-up in the list but
> tools like "xl list" will show "(null)". They are called zombie domains.
>
> So I would consider to keep the same domain around. The advantage is you
> can call "xl destroy" again to retry the operation.
>
> >
> > Thanks,
> > Jens
> > ---
> >   xen/arch/arm/tee/ffa.c | 36 ++++++++++++++++++++++++++++++++++++
> >   1 file changed, 36 insertions(+)
> >
> > diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
> > index 183528d13388..9c596462a8a2 100644
> > --- a/xen/arch/arm/tee/ffa.c
> > +++ b/xen/arch/arm/tee/ffa.c
> > @@ -1539,6 +1539,7 @@ static bool is_in_subscr_list(const uint16_t *subscr, 
> > uint16_t start,
> >   static int ffa_domain_teardown(struct domain *d)
> >   {
> >       struct ffa_ctx *ctx = d->arch.tee;
> > +    struct ffa_shm_mem *shm, *tmp;
> >       unsigned int n;
> >       int32_t res;
> >
> > @@ -1564,10 +1565,45 @@ static int ffa_domain_teardown(struct domain *d)
> >               printk(XENLOG_ERR "ffa: Failed to report destruction of vm_id 
> > %u to  %u: res %d\n",
> >                      get_vm_id(d), subscr_vm_destroyed[n], res);
> >       }
> > +    /*
> > +     * If this function is called again due to -ERESTART below, make sure
> > +     * not to send the FFA_MSG_SEND_VM_DESTROYED's.
> > +     */
> > +    subscr_vm_destroyed_count = 0;
>
> AFAICT, this variable is global. So wouldn't you effectively break other
> domain if let say the unmapping error is temporary?

You're right! I'll have to change this part a bit.

>
> >
> >       if ( ctx->rx )
> >           rxtx_unmap(ctx);
> >
> > +
> > +    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
> > +    {
> > +        register_t handle_hi;
> > +        register_t handle_lo;
> > +
> > +        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
> > +        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);
>
> Is this call expensive? If so, we may need to handle continuation here.
>
> > +        if ( res )
> > +        {
> > +            printk(XENLOG_INFO, "ffa: Failed to reclaim handle %#lx : 
> > %d\n",
> > +                   shm->handle, res);
>
> I think you want to use XENLOG_G_INFO to use the guest ratelimit. Also,
> I would suggest to print the domain ID in the logs (see '%pd').

Thanks for the tip, I'll update accordingly here and at the other places.

>
>
> > +        }
> > +        else
> > +        {
> > +            printk(XENLOG_DEBUG, "ffa: Reclaimed handle %#lx\n", 
> > shm->handle);
>
> Same here. You want to use XENLOG_G_DEBUG and print the domain ID.
>
> > +            ctx->shm_count--;
> > +            list_del(&shm->list);
> > +        }
> > +    }
>
> NIT: New line here please for clarity.

OK

>
> > +    if ( !list_empty(&ctx->shm_list) )
> > +    {
> > +        printk(XENLOG_INFO, "ffa: Remaining unclaimed handles, 
> > retrying\n");
>
> Same as the other printks.
>
> > +        /*
> > +         * TODO: add a timeout where we either panic or let the guest be
> > +         * fully destroyed.
> > +         */
> Timeout with proper handling would be a solution. I am not sure about
> panic-ing. Do you think the TEE would be in a bad state if we can't
> release memory?

No, that's not likely.

Thanks,
Jens

>
> > +        return -ERESTART;
> > +    }
> > +
> >       XFREE(d->arch.tee);
> >
> >       return 0;
>
> Cheers,
>
> --
> Julien Grall

Reply via email to