Hi Julien,

> -----Original Message-----
> From: Julien Grall <jul...@xen.org>
> Sent: 2022年3月2日 18:25
> To: Wei Chen <wei.c...@arm.com>; Stefano Stabellini
> <sstabell...@kernel.org>
> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> <bertrand.marq...@arm.com>; Penny Zheng <penny.zh...@arm.com>; Henry Wang
> <henry.w...@arm.com>; nd <n...@arm.com>
> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Wei,
> 
> On 02/03/2022 06:43, Wei Chen wrote:
> >> -----Original Message-----
> >> From: Julien Grall <jul...@xen.org>
> >> Sent: 2022年3月1日 21:17
> >> To: Wei Chen <wei.c...@arm.com>; Stefano Stabellini
> >> <sstabell...@kernel.org>
> >> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> >> <bertrand.marq...@arm.com>; Penny Zheng <penny.zh...@arm.com>; Henry
> Wang
> >> <henry.w...@arm.com>; nd <n...@arm.com>
> >> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >>
> >> On 01/03/2022 06:29, Wei Chen wrote:
> >>> Hi Julien,
> >>
> >> Hi,
> >>
> >>>> -----Original Message-----
> >>>> From: Julien Grall <jul...@xen.org>
> >>>> Sent: 2022年2月26日 4:12
> >>>> To: Wei Chen <wei.c...@arm.com>; Stefano Stabellini
> >>>> <sstabell...@kernel.org>
> >>>> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> >>>> <bertrand.marq...@arm.com>; Penny Zheng <penny.zh...@arm.com>; Henry
> >> Wang
> >>>> <henry.w...@arm.com>; nd <n...@arm.com>
> >>>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >>>>
> >>>> Hi Wei,
> >>>>
> >>>> On 25/02/2022 10:48, Wei Chen wrote:
> >>>>>>>        Armv8-R64 can support max to 256 MPU regions. But that's
> just
> >>>>>> theoretical.
> >>>>>>>        So we don't want to define `pr_t mpu_regions[256]`, this is
> a
> >>>> memory
> >>>>>> waste
> >>>>>>>        in most of time. So we decided to let the user specify
> through
> >> a
> >>>>>> Kconfig
> >>>>>>>        option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default
> value
> >> can
> >>>> be
> >>>>>> `32`,
> >>>>>>>        it's a typical implementation on Armv8-R64. Users will
> >> recompile
> >>>> Xen
> >>>>>> when
> >>>>>>>        their platform changes. So when the MPU changes,
> respecifying
> >> the
> >>>>>> MPU
> >>>>>>>        protection regions number will not cause additional
> problems.
> >>>>>>
> >>>>>> I wonder if we could probe the number of MPU regions at runtime and
> >>>>>> dynamically allocate the memory needed to store them in arch_vcpu.
> >>>>>>
> >>>>>
> >>>>> We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But
> it
> >>>> seems
> >>>>> we will encounter some static allocated arch_vcpu problems and
> sizeof
> >>>> issue.
> >>>>
> >>>> Does it need to be embedded in arch_vcpu? If not, then we could
> >> allocate
> >>>> memory outside and add a pointer in arch_vcpu.
> >>>>
> >>>
> >>> We had thought to use a pointer in arch_vcpu instead of embedding
> >> mpu_regions
> >>> into arch_vcpu. But we noticed that arch_vcpu has a
> __cacheline_aligned
> >>> attribute, this may be because of arch_vcpu will be used very
> frequently
> >>> in some critical path. So if we use the pointer for mpu_regions, may
> >> cause
> >>> some cache miss in these critical path, for example, in context_swtich.
> >>
> >>   From my understanding, the idea behind ``cacheline_aligned`` is to
> >> avoid the struct vcpu to be shared with other datastructure. Otherwise
> >> you may end up to have two pCPUs to frequently write the same cacheline
> >> which is not ideal.
> >>
> >> arch_vcpu should embbed anything that will be accessed often (e.g.
> >> entry/exit) to certain point. For instance, not everything related to
> >> the vGIC are embbed in the vCPU/Domain structure.
> >>
> >> I am a bit split regarding the mpu_regions. If they are mainly used in
> >> the context_switch() then I would argue this is a premature
> optimization
> >> because the scheduling decision is probably going to take a lot more
> >> time than the context switch itself.
> >
> > mpu_regions in arch_vcpu are used to save guest's EL1 MPU context. So,
> yes,
> > they are mainly used in context_switch. In terms of the number of
> registers,
> > it will save/restore more work than the original V8A. And on V8R we also
> need
> > to keep most of the original V8A save/restore work. So it will take
> longer
> > than the original V8A context_switch. And I think this is due to
> architecture's
> > difference. So it's impossible for us not to save/restore EL1 MPU region
> > registers in context_switch. And we have done some optimization for EL1
> MPU
> > save/restore:
> > 1. Assembly code for EL1 MPU context_switch
> 
> This discussion reminds me when KVM decided to rewrite their context
> switch from assembly to C. The outcome was the compiler is able to do a
> better job than us when it comes to optimizing.
> 
> With a C version, we could also share the save/restore code with 32-bit
> and it is easier to read/maintain.
> 
> So I would suggest to run some numbers to check if it really worth
> implementing the MPU save/restore in assembly.
> 

It's interesting to hear KVM guys have similar discussion. Yes, if the
gains of assembly code is not very obvious, then reusing the code for 32-bit
would be more important. As our current platform (FVP) could not do very
precise performance measurement. I want to keep current assembly code there,
when we have a platform that can do such measurement we can have a thread
to discuss it.

> > 2. Use real MPU regions number instead of
> CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS
> >     in context_switch. CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS is defined
> the Max
> >     supported EL1 MPU regions for this Xen image. All platforms that
> implement
> >     EL1 MPU regions in this range can work well with this Xen Image. But
> if the
> >     implemented EL1 MPU region number exceeds
> CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS,
> >     this Xen image could not work well on this platform.
> 
> This sounds similar to the GICv3. The number of LRs depends on the
> hardware. See how we dealt with it in gicv3_save_lrs().
>

This is a good suggestion, we will check the GIC code.

> >
> >>
> >> Note that for the P2M we already have that indirection because it is
> >> embbed in the struct domain.
> >
> > It's different with V8A P2M case. In V8A context_switch we just need to
> > save/restore VTTBR, we don't need to do P2M table walk. But on V8R, we
> > need to access valid mpu_regions for save/restore.
> 
> The save/restore for the P2M is a bit more complicated than simply
> save/restore the VTTBR. But yes, I agree the code for the MPU will
> likely be more complicated.
> 
> >
> >>
> >> This raises one question, why is the MPUs regions will be per-vCPU
> >> rather per domain?
> >>
> >
> > Because there is a EL1 MPU component for each pCPU. We can't assume
> guest
> > to use the same EL1 MPU configuration for all vCPU.
> 
> Ah. Sorry, I thought you were referring to whatever Xen will use to
> prevent the guest accessing outside of its designated region.
> 

NP : )

Thanks,
Wei Chen

> Cheers,
> 
> --
> Julien Grall

Reply via email to