Hi Julien, Thanks for the feedback. I'm answering the straightforward issues here and saving the rest for the emerging thread.
On Mon, Dec 4, 2023 at 8:24 PM Julien Grall <jul...@xen.org> wrote: > > Hi Jens, > > On 04/12/2023 07:55, Jens Wiklander wrote: > > When an FF-A enabled guest is destroyed it may leave behind memory > > shared with SPs. This memory must be reclaimed before it's reused or an > > SP may make changes to memory used by a new unrelated guest. So when the > > domain is teared down add FF-A requests to reclaim all remaining shared > > memory. > > > > SPs in the secure world are notified using VM_DESTROYED that a guest has > > been destroyed. An SP is supposed to relinquish all shared memory to allow > > reclaiming the memory. The relinquish operation may need to be delayed if > > the shared memory is for instance part of a DMA operation. > > > > If the FF-A memory reclaim request fails, return -ERESTART to retry > > again. This will effectively block the destruction of the guest until > > all memory has been reclaimed. > > > > Signed-off-by: Jens Wiklander <jens.wiklan...@linaro.org> > > --- > > Hi, > > > > This patch is a bit crude, but gets the job done. In a well designed > > system this might even be good enough since the SP or the secure world > > will let the memory be reclaimed and we can move on. But, if for some > > reason reclaiming the memory is refused it must not be possible to reuse > > the memory. > > IIUC, we are trying to harden against a buggy SP. Is that correct? > > > > > These shared memory ranges are typically quite small compared to the > > total memory usage of a guest so it would be an improvement if only > > refused shared memory ranges where set aside from future reuse while the > > guest was destroyed and other resources made available for reuse. This > > could be done by for instance assign the refused shared memory ranges > > to a dummy VM like DOMID_IO. > > I like the idea to use a dummy VM, but I don't think DOMID_IO is right. > Once teardown has completed, the domain will stay around until the last > reference on all pages are dropped. At this point, the amount of memory > left-over is minimum (this is mostly bookeeping in Xen). > > From the userland PoV, the domain will still show-up in the list but > tools like "xl list" will show "(null)". They are called zombie domains. > > So I would consider to keep the same domain around. The advantage is you > can call "xl destroy" again to retry the operation. > > > > > Thanks, > > Jens > > --- > > xen/arch/arm/tee/ffa.c | 36 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 36 insertions(+) > > > > diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c > > index 183528d13388..9c596462a8a2 100644 > > --- a/xen/arch/arm/tee/ffa.c > > +++ b/xen/arch/arm/tee/ffa.c > > @@ -1539,6 +1539,7 @@ static bool is_in_subscr_list(const uint16_t *subscr, > > uint16_t start, > > static int ffa_domain_teardown(struct domain *d) > > { > > struct ffa_ctx *ctx = d->arch.tee; > > + struct ffa_shm_mem *shm, *tmp; > > unsigned int n; > > int32_t res; > > > > @@ -1564,10 +1565,45 @@ static int ffa_domain_teardown(struct domain *d) > > printk(XENLOG_ERR "ffa: Failed to report destruction of vm_id > > %u to %u: res %d\n", > > get_vm_id(d), subscr_vm_destroyed[n], res); > > } > > + /* > > + * If this function is called again due to -ERESTART below, make sure > > + * not to send the FFA_MSG_SEND_VM_DESTROYED's. > > + */ > > + subscr_vm_destroyed_count = 0; > > AFAICT, this variable is global. So wouldn't you effectively break other > domain if let say the unmapping error is temporary? You're right! I'll have to change this part a bit. > > > > > if ( ctx->rx ) > > rxtx_unmap(ctx); > > > > + > > + list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list) > > + { > > + register_t handle_hi; > > + register_t handle_lo; > > + > > + uint64_to_regpair(&handle_hi, &handle_lo, shm->handle); > > + res = ffa_mem_reclaim(handle_lo, handle_hi, 0); > > Is this call expensive? If so, we may need to handle continuation here. > > > + if ( res ) > > + { > > + printk(XENLOG_INFO, "ffa: Failed to reclaim handle %#lx : > > %d\n", > > + shm->handle, res); > > I think you want to use XENLOG_G_INFO to use the guest ratelimit. Also, > I would suggest to print the domain ID in the logs (see '%pd'). Thanks for the tip, I'll update accordingly here and at the other places. > > > > + } > > + else > > + { > > + printk(XENLOG_DEBUG, "ffa: Reclaimed handle %#lx\n", > > shm->handle); > > Same here. You want to use XENLOG_G_DEBUG and print the domain ID. > > > + ctx->shm_count--; > > + list_del(&shm->list); > > + } > > + } > > NIT: New line here please for clarity. OK > > > + if ( !list_empty(&ctx->shm_list) ) > > + { > > + printk(XENLOG_INFO, "ffa: Remaining unclaimed handles, > > retrying\n"); > > Same as the other printks. > > > + /* > > + * TODO: add a timeout where we either panic or let the guest be > > + * fully destroyed. > > + */ > Timeout with proper handling would be a solution. I am not sure about > panic-ing. Do you think the TEE would be in a bad state if we can't > release memory? No, that's not likely. Thanks, Jens > > > + return -ERESTART; > > + } > > + > > XFREE(d->arch.tee); > > > > return 0; > > Cheers, > > -- > Julien Grall