RE: [PATCH] doc/arm: take care restore order of GICR_* in ITS restore

2021-07-22 Thread Jianyong Wu
Hi Marc,

Get it! Thanks for your explanation.

Thanks
Jianyong

> -Original Message-
> From: Marc Zyngier 
> Sent: Thursday, July 22, 2021 4:11 PM
> To: Jianyong Wu 
> Cc: James Morse ; Andre Przywara
> ; lushenm...@huawei.com;
> k...@vger.kernel.org; kvmarm@lists.cs.columbia.edu; linux-
> d...@vger.kernel.org; linux-ker...@vger.kernel.org; Justin He
> 
> Subject: Re: [PATCH] doc/arm: take care restore order of GICR_* in ITS
> restore
>
> On Thu, 22 Jul 2021 03:49:52 +0100,
> Jianyong Wu  wrote:
> >
> > Hello Marc,
> >
> > > -Original Message-
> > > From: Marc Zyngier 
> > > Sent: Wednesday, July 21, 2021 5:54 PM
> > > To: Jianyong Wu 
> > > Cc: James Morse ; Andre Przywara
> > > ; lushenm...@huawei.com;
> > > k...@vger.kernel.org; kvmarm@lists.cs.columbia.edu; linux-
> > > d...@vger.kernel.org; linux-ker...@vger.kernel.org; Justin He
> > > 
> > > Subject: Re: [PATCH] doc/arm: take care restore order of GICR_* in
> > > ITS restore
> > >
> > > On Wed, 21 Jul 2021 10:20:19 +0100,
> > > Jianyong Wu  wrote:
> > > >
> > > > When restore GIC/ITS, GICR_CTLR must be restored after
> > > > GICR_PROPBASER and GICR_PENDBASER. That is important, as both of
> > > > GICR_PROPBASER and GICR_PENDBASER will fail to be loaded when lpi
> > > > has enabled yet in GICR_CTLR. Keep the restore order above will avoid
> that issue.
> > > > Shout it out at the doc is very helpful that may avoid lots of debug 
> > > > work.
> > >
> > > But that's something that is already mandated by the architecture, isn't 
> > > it?
> > > See "5.1 LPIs" in the architecture spec:
> > >
> > > 
> > >
> > > If GICR_PROPBASER is updated when GICR_CTLR.EnableLPIs == 1, the
> > > effects are UNPREDICTABLE.
> > >
> > > [...]
> > >
> > > If GICR_PENDBASER is updated when GICR_CTLR.EnableLPIs == 1, the
> > > effects are UNPREDICTABLE.
> > >
> >
> > I think this "UNPREDICTABLE" related with the "physical machine". Am I
> > right?
>
> No, you are unfortunately wrong. The architecture applies to *any*
> implementation, and makes no distinction between a HW implementation of
> a SW version. This is why we call it an architecture, and not an
> implementation.
>
> > In virtualization environment, kernel gives the definite answer that
> > we should not enable GICR_CTLR.EnableLPIs before restoring
> > GICR_PROPBASER(GICR_PENDBASER either) when restore GIC ITS in VMM,
> see
> > [1]. Thus, should we consider the virtualization environment as a
> > special case?
>
> Absolutely not.  If you start special casing things, then we end-up having
> stupidly designed SW that tries to do stupid things based on the supposed
> properties of an implementation.
>
> We follow the architecture, all the architecture, nothing but the 
> architecture.
> The architecture is the only barrier between insanity and pure madness! ;-)
>
> >
> > [1] linux/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > static void vgic_mmio_write_propbase(struct kvm_vcpu *vcpu,
> >  gpa_t addr, unsigned int len,
> >  unsigned long val) {
> > struct vgic_dist *dist = >kvm->arch.vgic;
> > struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
> > u64 old_propbaser, propbaser;
> >
> > /* Storing a value with LPIs already enabled is undefined */
> > if (vgic_cpu->lpis_enabled)
> >return;
> > ...
> > }
>
> Do you see how the kernel does exactly what the architecture says we can do?
> Ignoring the write is a perfectly valid implementation of UNPREDICTABLE.
>
> So what we do is completely in line with the architecture. As such, no need to
> document it any further, everything is already where it should be. If
> someone tries to write code dealing with the GIC without understanding the
> architecture, no amount of additional documentation will help.
>
> Thanks,
>
>   M.
>
> --
> Without deviation from the norm, progress is not possible.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 00/12] KVM: Add idempotent controls for migrating system counter state

2021-07-22 Thread Oliver Upton
On Wed, Jul 21, 2021 at 8:28 AM Andrew Jones  wrote:
>
> On Fri, Jul 16, 2021 at 09:26:17PM +, Oliver Upton wrote:
> > KVM's current means of saving/restoring system counters is plagued with
> > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > system counter by-value through the respective guest system register
> > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > brittle as the state is not idempotent: the host system counter is still
> > oscillating between the attempted save and restore. Furthermore, VMMs
> > may wish to transparently live migrate guest VMs, meaning that they
> > include the elapsed time due to live migration blackout in the guest
> > system counter view. The VMM thread could be preempted for any number of
> > reasons (scheduler, L0 hypervisor under nested) between the time that
> > it calculates the desired guest counter value and when KVM actually sets
> > this counter state.
> >
> > Despite the value-based interface that we present to userspace, KVM
> > actually has idempotent guest controls by way of system counter offsets.
> > We can avoid all of the issues associated with a value-based interface
> > by abstracting these offset controls in new ioctls. This series
> > introduces new vCPU device attributes to provide userspace access to the
> > vCPU's system counter offset.
> >
> > Patch 1 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> > ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> > essential for a VMM to perform precise migration of the guest's system
> > counters.
> >
> > Patches 2-3 add support for x86 by shoehorning the new controls into the
> > pre-existing synchronization heuristics.
> >
> > Patches 4-5 implement a test for the new additions to
> > KVM_{GET,SET}_CLOCK.
> >
> > Patches 6-7 implement at test for the tsc offset attribute introduced in
> > patch 3.
> >
> > Patch 8 adds a device attribute for the arm64 virtual counter-timer
> > offset.
> >
> > Patch 9 extends the test from patch 7 to cover the arm64 virtual
> > counter-timer offset.
> >
> > Patch 10 adds a device attribute for the arm64 physical counter-timer
> > offset. Currently, this is implemented as a synthetic register, forcing
> > the guest to trap to the host and emulating the offset in the fast exit
> > path. Later down the line we will have hardware with FEAT_ECV, which
> > allows the hypervisor to perform physical counter-timer offsetting in
> > hardware (CNTPOFF_EL2).
> >
> > Patch 11 extends the test from patch 7 to cover the arm64 physical
> > counter-timer offset.
> >
> > Patch 12 introduces a benchmark to measure the overhead of emulation in
> > patch 10.
> >
> > Physical counter benchmark
> > --
> >
> > The following data was collected by running 1 iterations of the
> > benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
> > machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> > parameter.
> >
> > nVHE
> > 
> >
> > +++-+
> > |   Metric   | Native | Trapped |
> > +++-+
> > | Average| 54ns   | 148ns   |
> > | Standard Deviation | 124ns  | 122ns   |
> > | 95th Percentile| 258ns  | 348ns   |
> > +++-+
> >
> > VHE
> > ---
> >
> > +++-+
> > |   Metric   | Native | Trapped |
> > +++-+
> > | Average| 53ns   | 152ns   |
> > | Standard Deviation | 92ns   | 94ns|
> > | 95th Percentile| 204ns  | 307ns   |
> > +++-+
> >
> > This series applies cleanly to the following commit:
> >
> > 1889228d80fe ("KVM: selftests: smm_test: Test SMM enter from L2")
> >
> > v1 -> v2:
> >   - Reimplemented as vCPU device attributes instead of a distinct ioctl.
> >   - Added the (realtime, host_tsc) instant support to
> > KVM_{GET,SET}_CLOCK
> >   - Changed the arm64 implementation to broadcast counter offset values
> > to all vCPUs in a guest. This upholds the architectural expectations
> > of a consistent counter-timer across CPUs.
> >   - Fixed a bug with traps in VHE mode. We now configure traps on every
> > transition into a guest to handle differing VMs (trapped, emulated).
> >
>
> Oops, I see there's a v3 of this series. I'll switch to reviewing that. I
> think my comments / r-b's apply to that version as well though.

Hey Drew,

Thanks for the review. I'll address your comments from both v2 and v3
in the next series.

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure

2021-07-22 Thread Marc Zyngier
On Tue, 20 Jul 2021 16:49:45 +0100,
Quentin Perret  wrote:
> 
> On Tuesday 20 Jul 2021 at 14:15:56 (+0100), Marc Zyngier wrote:
> > On Tue, 20 Jul 2021 12:13:20 +0100,
> > Quentin Perret  wrote:
> > > 
> > > On Thursday 15 Jul 2021 at 17:31:47 (+0100), Marc Zyngier wrote:
> > > > +struct s2_walk_data {
> > > > +   kvm_pte_t   pteval;
> > > > +   u32 level;
> > > > +};
> > > > +
> > > > +static int s2_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > > > +enum kvm_pgtable_walk_flags flag, void * const arg)
> > > > +{
> > > > +   struct s2_walk_data *data = arg;
> > > > +
> > > > +   data->level = level;
> > > > +   data->pteval = *ptep;
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +/* Assumes mmu_lock taken */
> > > > +static bool __check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
> > > > +{
> > > > +   struct s2_walk_data data;
> > > > +   struct kvm_pgtable_walker walker = {
> > > > +   .cb = s2_walker,
> > > > +   .flags  = KVM_PGTABLE_WALK_LEAF,
> > > > +   .arg= ,
> > > > +   };
> > > > +
> > > > +   kvm_pgtable_walk(vcpu->arch.hw_mmu->pgt, ALIGN_DOWN(ipa, 
> > > > PAGE_SIZE),
> > > > +PAGE_SIZE, );
> > > > +
> > > > +   /* Must be a PAGE_SIZE mapping with our annotation */
> > > > +   return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(data.level)) == 
> > > > PAGE_SIZE &&
> > > > +   data.pteval == MMIO_NOTE);
> > > 
> > > Nit: you could do this check in the walker directly and check the return
> > > value of kvm_pgtable_walk() instead. That would allow to get rid of
> > > struct s2_walk_data.
> > > 
> > > Also, though the compiler might be able to optimize, maybe simplify the
> > > level check to level == (KVM_PGTABLE_MAX_LEVELS - 1)?
> > 
> > Yup, all good points. I guess I could do the same in my other series
> > that parses the userspace PT to extract the level.
> 
> Well, actually, let me take that back. I think something like you have
> would be useful, but in pgtable.c directly and re-usable for stage-1 and
> stage-2 walks. Maybe something like the below (totally untested)?
> 
> I could use such a walker in several places as well in the memory
> ownership series:
> 
>  - following the idea of [1], I could remove the
>kvm_pgtable_stage2_find_range() function entirely;
> 
>  - [2] defines 2 custom walkers that do nothing but walk host stage-2
>and hyp stage-1 page-tables to check permissions and such --  they
>could be removed/re-implemented easily as well.
> 
> And you seem to need something similar here, so clearly there is a need.
> WDYT?

So FWIW, I've now pushed out an updated series for the THP changes[1],
and you will find a similar patch at the base of the branch. Please
have a look and let me know what you think!

Thanks,

M.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/mmu/mapping-levels

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM

2021-07-22 Thread Jean-Philippe Brucker
Hi Shameer,

On Wed, Jul 21, 2021 at 08:54:00AM +, Shameerali Kolothum Thodi wrote:
> > More generally I think this pinned VMID set conflicts with that of
> > stage-2-only domains (which is the default state until a guest attaches a
> > PASID table). Say you have one guest using DOMAIN_NESTED without PASID
> > table, just DMA to IPA using VMID 0x8000. Now another guest attaches a
> > PASID table and obtains the same VMID from KVM. The stage-2 translation
> > might use TLB entries from the other guest, no?  They'll both create
> > stage-2 TLB entries with {StreamWorld=NS-EL1, VMID=0x8000}
> 
> Now that we are trying to align the KVM VMID allocation algorithm similar to
> that of the ASID allocator [1], I attempted to use that for the SMMU pinned 
> VMID allocation. But the issue you have mentioned above is still valid. 
> 
> And as a solution what I have tried now is follow what pinned ASID is doing 
> in SVA,
>  -Use xarray for private VMIDs
>  -Get pinned VMID from KVM for DOMAIN_NESTED with PASID table
>  -If the new pinned VMID is in use by private, then update the private
>   VMID(VMID update to a live STE).
> 
> This seems to work, but still need to run more tests with this though.  
>
> > It's tempting to allocate all VMIDs through KVM instead, but that will
> > force a dependency on KVM to use VFIO_TYPE1_NESTING_IOMMU and might
> > break
> > existing users of that extension (though I'm not sure there are any).
> > Instead we might need to restrict the SMMU VMID bitmap to match the
> > private VMID set in KVM.
> 
> Another solution I have in mind is, make the new KVM VMID allocator common
> between SMMUv3 and KVM. This will help to avoid all the private and shared
> VMID splitting, also no need for live updates to STE VMID. One possible 
> drawback
> is less number of available KVM VMIDs but with 16 bit VMID space I am not sure
> how much that is a concern.

Yes I think that works too. In practice there shouldn't be many VMIDs on
the SMMU side, the feature's only enabled when a user wants to assign
devices with nesting translation (unlike ASIDs where each device in the
system gets a private ASID by default).

Note that you still need to pin all VMIDs used by the SMMU, otherwise
you'll have to update the STE after rollover.

The problem we have with VFIO_TYPE1_NESTING_IOMMU might be solved by the
upcoming deprecation of VFIO_*_IOMMU [2]. We need a specific sequence from
userspace:
1. Attach VFIO group to KVM (KVM_DEV_VFIO_GROUP_ADD)
2. Create nesting IOMMU domain and attach the group to it
   (VFIO_GROUP_SET_CONTAINER, VFIO_SET_IOMMU becomes
IOMMU_IOASID_ALLOC, VFIO_DEVICE_ATTACH_IOASID)
Currently QEMU does 2 then 1, which would cause the SMMU to allocate a
separate VMID. If we wanted to extend VFIO_TYPE1_NESTING_IOMMU with PASID
tables we'd need to mandate 1-2 and may break existing users. In the new
design we can require from the start that creating a nesting IOMMU
container through /dev/iommu *must* come with a KVM context, that way
we're sure to reuse the existing VMID.

Thanks,
Jean

[2] 
https://lore.kernel.org/linux-iommu/bn9pr11mb5433b1e4ae5b0480369f97178c...@bn9pr11mb5433.namprd11.prod.outlook.com/
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/3] kvm/arm: Align the VMID allocation with the arm64 ASID one

2021-07-22 Thread Will Deacon
Hi Vladimir,

On Thu, Jul 22, 2021 at 04:22:26PM +0100, Vladimir Murzin wrote:
> On 7/22/21 10:50 AM, Will Deacon wrote:
> > As an aside: I'm more and more inclined to rip out the CnP stuff given
> > that it doesn't appear to being any benefits, but does have some clear
> > downsides. Perhaps something for next week.
> 
> Can you please clarify what do you mean by "it doesn't appear to being any
> benefits"? IIRC, Cortex-A65 implements CnP hint and I've heard that some
> payloads seen improvement...

Has anybody taped that out? I'd have thought building an SMT design in 2021
is a reasonably courageous thing to do.

The issue I'm getting at is that modern cores seem to advertise CnP even
if they ignore it internally, maybe because of some big/little worries?
That would be fine if it didn't introduce complexity and overhead to the
kernel, but it does and therefore I think we should rip it out (or at
least stick it behind a "default n" config option if there are some niche
users).

There are also open questions as to exactly what CnP does because the
architecture is not clear at all (for example TTBRx_EL1.CnP is permitted
to be cached in a TLB).

CHeers,

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 00/16] KVM: arm64: MMIO guard PV services

2021-07-22 Thread Marc Zyngier
On Thu, 22 Jul 2021 14:25:15 +0100,
Andrew Jones  wrote:
> 
> On Thu, Jul 22, 2021 at 11:00:26AM +0100, Marc Zyngier wrote:
> > On Wed, 21 Jul 2021 22:42:43 +0100,
> > Andrew Jones  wrote:
> > > 
> > > On Thu, Jul 15, 2021 at 05:31:43PM +0100, Marc Zyngier wrote:
> > > > KVM/arm64 currently considers that any memory access outside of a
> > > > memslot is a MMIO access. This so far has served us very well, but
> > > > obviously relies on the guest trusting the host, and especially
> > > > userspace to do the right thing.
> > > > 
> > > > As we keep on hacking away at pKVM, it becomes obvious that this trust
> > > > model is not really fit for a confidential computing environment, and
> > > > that the guest would require some guarantees that emulation only
> > > > occurs on portions of the address space that have clearly been
> > > > identified for this purpose.
> > > 
> > > This trust model is hard for me to reason about. userspace is trusted to
> > > control the life cycle of the VM, to prepare the memslots for the VM,
> > > and [presumably] identify what MMIO ranges are valid, yet it's not
> > > trusted to handle invalid MMIO accesses. I'd like to learn more about
> > > this model and the userspace involved.
> > 
> > Imagine the following scenario:
> > 
> > On top of the normal memory described as memslots (which pKVM will
> > ensure that userspace cannot access),
> 
> Ah, I didn't know that part.

Yeah, that's the crucial bit. By default, pKVM guests do not share any
memory with anyone, so the memslots are made inaccessible from both
the VMM and the host kernel. The guest has to explicitly change the
state of the memory it wants to share back with the host for things
like IO.

> 
> > a malicious userspace describes
> > to the guest another memory region in a firmware table and does not
> > back it with a memslot.
> > 
> > The hypervisor cannot validate this firmware description (imagine
> > doing ACPI and DT parsing at EL2...), so the guest starts using this
> > "memory" for something, and data slowly trickles all the way to EL0.
> > Not what you wanted.
> 
> Yes, I see that now, in light of the above.
> 
> > 
> > To ensure that this doesn't happen, we reverse the problem: userspace
> > (and ultimately the EL1 kernel) doesn't get involved on a translation
> > fault outside of a memslot *unless* the guest has explicitly asked for
> > that page to be handled as a MMIO. With that, we have a full
> > description of the IPA space contained in the S2 page tables:
> > 
> > - memory described via a memslot,
> > - directly mapped device (GICv2, for exmaple),
> > - MMIO exposed for emulation
> > 
> > and anything else is an invalid access that results in an abort.
> > 
> > Does this make sense to you?
> 
> Now I understand better, but if we're worried about malicious userspaces,
> then how do we protect the guest from "bad" MMIO devices that have been
> described to it? The guest can request access to those using this new
> mechanism.

We don't try to do anything about a malicious IO device. Any IO should
be considered as malicious, and you don't want to give it anything in
clear-text if it is supposed to be secret.

Eventually, you'd probably want directly assigned devices that can
attest to the guest that they are what they pretend to be, but that's
a long way away. For now, we only want to enable virtio with a reduced
level of trust (bounce buffering via shared pages for DMA, and reduced
MMIO exposure).

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/3] kvm/arm: Align the VMID allocation with the arm64 ASID one

2021-07-22 Thread Vladimir Murzin
On 7/22/21 10:50 AM, Will Deacon wrote:
> As an aside: I'm more and more inclined to rip out the CnP stuff given
> that it doesn't appear to being any benefits, but does have some clear
> downsides. Perhaps something for next week.

Can you please clarify what do you mean by "it doesn't appear to being any
benefits"? IIRC, Cortex-A65 implements CnP hint and I've heard that some
payloads seen improvement...

Cheers
Vladimir 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 00/16] KVM: arm64: MMIO guard PV services

2021-07-22 Thread Andrew Jones
On Thu, Jul 22, 2021 at 11:00:26AM +0100, Marc Zyngier wrote:
> On Wed, 21 Jul 2021 22:42:43 +0100,
> Andrew Jones  wrote:
> > 
> > On Thu, Jul 15, 2021 at 05:31:43PM +0100, Marc Zyngier wrote:
> > > KVM/arm64 currently considers that any memory access outside of a
> > > memslot is a MMIO access. This so far has served us very well, but
> > > obviously relies on the guest trusting the host, and especially
> > > userspace to do the right thing.
> > > 
> > > As we keep on hacking away at pKVM, it becomes obvious that this trust
> > > model is not really fit for a confidential computing environment, and
> > > that the guest would require some guarantees that emulation only
> > > occurs on portions of the address space that have clearly been
> > > identified for this purpose.
> > 
> > This trust model is hard for me to reason about. userspace is trusted to
> > control the life cycle of the VM, to prepare the memslots for the VM,
> > and [presumably] identify what MMIO ranges are valid, yet it's not
> > trusted to handle invalid MMIO accesses. I'd like to learn more about
> > this model and the userspace involved.
> 
> Imagine the following scenario:
> 
> On top of the normal memory described as memslots (which pKVM will
> ensure that userspace cannot access),

Ah, I didn't know that part.

> a malicious userspace describes
> to the guest another memory region in a firmware table and does not
> back it with a memslot.
> 
> The hypervisor cannot validate this firmware description (imagine
> doing ACPI and DT parsing at EL2...), so the guest starts using this
> "memory" for something, and data slowly trickles all the way to EL0.
> Not what you wanted.

Yes, I see that now, in light of the above.

> 
> To ensure that this doesn't happen, we reverse the problem: userspace
> (and ultimately the EL1 kernel) doesn't get involved on a translation
> fault outside of a memslot *unless* the guest has explicitly asked for
> that page to be handled as a MMIO. With that, we have a full
> description of the IPA space contained in the S2 page tables:
> 
> - memory described via a memslot,
> - directly mapped device (GICv2, for exmaple),
> - MMIO exposed for emulation
> 
> and anything else is an invalid access that results in an abort.
> 
> Does this make sense to you?

Now I understand better, but if we're worried about malicious userspaces,
then how do we protect the guest from "bad" MMIO devices that have been
described to it? The guest can request access to those using this new
mechanism.

Thanks,
drew

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/3] kvm/arm: Align the VMID allocation with the arm64 ASID one

2021-07-22 Thread Quentin Perret
On Thursday 22 Jul 2021 at 06:45:14 (+), Shameerali Kolothum Thodi wrote:
> > From: Will Deacon [mailto:w...@kernel.org]
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > index 4b60c0056c04..a02c4877a055 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > @@ -106,8 +106,7 @@ int kvm_host_prepare_stage2(void *mem_pgt_pool,
> > void *dev_pgt_pool)
> > >   mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
> > >   mmu->arch = _kvm.arch;
> > >   mmu->pgt = _kvm.pgt;
> > > - mmu->vmid.vmid_gen = 0;
> > > - mmu->vmid.vmid = 0;
> > > + atomic64_set(>vmid.id, 0);
> > 
> > I think this is the first atomic64 use in the EL2 object, which may pull in
> > some fatal KCSAN instrumentation. Quentin, have you run into this before?
> > 
> > Might be simple just to zero-initialise mmu for now, if it isn't already.
> 
> I will check that.

Yes I think what saves us here is that, AFAICT. arm64 doesn't support
KCSAN yet. But the day it does, this should fail to link (hopefully)
because of out-of-line calls into e.g. __kasan_check_write().

So yes, a simple zeroing here is probably preferable.

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 00/16] KVM: arm64: MMIO guard PV services

2021-07-22 Thread Marc Zyngier
On Wed, 21 Jul 2021 22:42:43 +0100,
Andrew Jones  wrote:
> 
> On Thu, Jul 15, 2021 at 05:31:43PM +0100, Marc Zyngier wrote:
> > KVM/arm64 currently considers that any memory access outside of a
> > memslot is a MMIO access. This so far has served us very well, but
> > obviously relies on the guest trusting the host, and especially
> > userspace to do the right thing.
> > 
> > As we keep on hacking away at pKVM, it becomes obvious that this trust
> > model is not really fit for a confidential computing environment, and
> > that the guest would require some guarantees that emulation only
> > occurs on portions of the address space that have clearly been
> > identified for this purpose.
> 
> This trust model is hard for me to reason about. userspace is trusted to
> control the life cycle of the VM, to prepare the memslots for the VM,
> and [presumably] identify what MMIO ranges are valid, yet it's not
> trusted to handle invalid MMIO accesses. I'd like to learn more about
> this model and the userspace involved.

Imagine the following scenario:

On top of the normal memory described as memslots (which pKVM will
ensure that userspace cannot access), a malicious userspace describes
to the guest another memory region in a firmware table and does not
back it with a memslot.

The hypervisor cannot validate this firmware description (imagine
doing ACPI and DT parsing at EL2...), so the guest starts using this
"memory" for something, and data slowly trickles all the way to EL0.
Not what you wanted.

To ensure that this doesn't happen, we reverse the problem: userspace
(and ultimately the EL1 kernel) doesn't get involved on a translation
fault outside of a memslot *unless* the guest has explicitly asked for
that page to be handled as a MMIO. With that, we have a full
description of the IPA space contained in the S2 page tables:

- memory described via a memslot,
- directly mapped device (GICv2, for exmaple),
- MMIO exposed for emulation

and anything else is an invalid access that results in an abort.

Does this make sense to you?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/3] kvm/arm: Align the VMID allocation with the arm64 ASID one

2021-07-22 Thread Will Deacon
On Thu, Jul 22, 2021 at 06:45:14AM +, Shameerali Kolothum Thodi wrote:
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > index 83dc3b271bc5..42df9931ed9a 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > @@ -140,10 +140,10 @@ void __kvm_flush_cpu_context(struct
> > kvm_s2_mmu *mmu)
> > >   __tlb_switch_to_host();
> > >  }
> > >
> > > -void __kvm_flush_vm_context(void)
> > > +void __kvm_tlb_flush_local_all(void)
> > >  {
> > > - dsb(ishst);
> > > - __tlbi(alle1is);
> > > + dsb(nshst);
> > > + __tlbi(alle1);
> > >
> > >   /*
> > >* VIPT and PIPT caches are not affected by VMID, so no maintenance
> > > @@ -155,7 +155,7 @@ void __kvm_flush_vm_context(void)
> > >*
> > >*/
> > >   if (icache_is_vpipt())
> > > - asm volatile("ic ialluis");
> > > + asm volatile("ic iallu" : : );
> > >
> > > - dsb(ish);
> > > + dsb(nsh);
> > 
> > Hmm, I'm wondering whether having this local stuff really makes sense for
> > VMIDs. For ASIDs, where rollover can be frequent and TLBI could result in
> > IPI on 32-bit, the local option was important, but here rollover is less
> > frequent, DVM is relied upon to work and the cost of a hypercall is
> > significant with nVHE.
> > 
> > So I do think you could simplify patch 2 slightly to drop the
> > flush_pending and just issue inner-shareable invalidation on rollover.
> > With that, it might also make it straightforward to clear active_asids
> > when scheduling out a vCPU, which would solve the other problem I
> > mentioned
> > about unnecessarily reserving a bunch of the VMID space.
> 
> Ok. I will try out the above suggestion. Hope it will be acceptable for 8 bit 
> VMID systems as well as there is a higher chance for rollover especially
> when we introduce pinned VMIDs(I am not sure such platforms care about
> Pinned VMID or not. If not, we could limit Pinned VMIDs to 16 bit systems).

So I woke up at 3am in a cold sweat after dreaming about this code.

I think my suggestion above still stands for the VMID allocator, but
interestingly, it would _not_ be valid for the ASID allocator because there
the ASID is part of the active context and so, during the window where the
active_asid is out of sync with the TTBR, receiving a broadcast TLBI from a
concurrent rollover wouldn't be enough to knock out the old ASID from the
TLB despite it subsequently being made available for reallocation. So the
local TLB invalidation is not just a performance hint as I said; it's crucial
to the way the thing works (and this is also why the CnP code has to switch
to the reserved TTBR0).

As an aside: I'm more and more inclined to rip out the CnP stuff given
that it doesn't appear to being any benefits, but does have some clear
downsides. Perhaps something for next week.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH V9 01/18] perf/core: Use static_call to optimize perf_guest_info_callbacks

2021-07-22 Thread Zhu Lingshan
From: Like Xu 

For "struct perf_guest_info_callbacks", the two fields "is_in_guest"
and "is_user_mode" are replaced with a new multiplexed member named
"state", and the "get_guest_ip" field will be renamed to "get_ip".

For arm64, xen and kvm/x86, the application of DEFINE_STATIC_CALL_RET0
could make all that perf_guest_cbs stuff suck less. For arm, csky, nds32,
and riscv, just applied some renamed refactoring.

Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Guo Ren 
Cc: Nick Hu 
Cc: Paul Walmsley 
Cc: Boris Ostrovsky 
Cc: linux-arm-ker...@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-c...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: xen-de...@lists.xenproject.org
Suggested-by: Peter Zijlstra (Intel) 
Original-by: Peter Zijlstra (Intel) 
Signed-off-by: Like Xu 
Signed-off-by: Zhu Lingshan 
Reviewed-by: Boris Ostrovsky 
---
 arch/arm/kernel/perf_callchain.c   | 16 +++-
 arch/arm64/kernel/perf_callchain.c | 29 +-
 arch/arm64/kvm/perf.c  | 22 -
 arch/csky/kernel/perf_callchain.c  |  4 +--
 arch/nds32/kernel/perf_event_cpu.c | 16 +++-
 arch/riscv/kernel/perf_callchain.c |  4 +--
 arch/x86/events/core.c | 39 --
 arch/x86/events/intel/core.c   |  7 +++---
 arch/x86/include/asm/kvm_host.h|  2 +-
 arch/x86/kvm/pmu.c |  2 +-
 arch/x86/kvm/x86.c | 37 +++-
 arch/x86/xen/pmu.c | 33 ++---
 include/linux/perf_event.h | 12 ++---
 kernel/events/core.c   |  9 +++
 14 files changed, 144 insertions(+), 88 deletions(-)

diff --git a/arch/arm/kernel/perf_callchain.c b/arch/arm/kernel/perf_callchain.c
index 3b69a76d341e..1ce30f86d6c7 100644
--- a/arch/arm/kernel/perf_callchain.c
+++ b/arch/arm/kernel/perf_callchain.c
@@ -64,7 +64,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, 
struct pt_regs *regs
 {
struct frame_tail __user *tail;
 
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+   if (perf_guest_cbs && perf_guest_cbs->state()) {
/* We don't support guest os callchain now */
return;
}
@@ -100,7 +100,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
 {
struct stackframe fr;
 
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+   if (perf_guest_cbs && perf_guest_cbs->state()) {
/* We don't support guest os callchain now */
return;
}
@@ -111,8 +111,8 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
-   return perf_guest_cbs->get_guest_ip();
+   if (perf_guest_cbs && perf_guest_cbs->state())
+   return perf_guest_cbs->get_ip();
 
return instruction_pointer(regs);
 }
@@ -120,9 +120,13 @@ unsigned long perf_instruction_pointer(struct pt_regs 
*regs)
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
int misc = 0;
+   unsigned int state = 0;
 
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-   if (perf_guest_cbs->is_user_mode())
+   if (perf_guest_cbs)
+   state = perf_guest_cbs->state();
+
+   if (perf_guest_cbs && state) {
+   if (state & PERF_GUEST_USER)
misc |= PERF_RECORD_MISC_GUEST_USER;
else
misc |= PERF_RECORD_MISC_GUEST_KERNEL;
diff --git a/arch/arm64/kernel/perf_callchain.c 
b/arch/arm64/kernel/perf_callchain.c
index 4a72c2727309..1b344e23fd2f 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2015 ARM Limited
  */
 #include 
+#include 
 #include 
 
 #include 
@@ -99,10 +100,25 @@ compat_user_backtrace(struct compat_frame_tail __user 
*tail,
 }
 #endif /* CONFIG_COMPAT */
 
+DEFINE_STATIC_CALL_RET0(arm64_guest_state, *(perf_guest_cbs->state));
+DEFINE_STATIC_CALL_RET0(arm64_guest_get_ip, *(perf_guest_cbs->get_ip));
+
+void arch_perf_update_guest_cbs(void)
+{
+   static_call_update(arm64_guest_state, (void *)&__static_call_return0);
+   static_call_update(arm64_guest_get_ip, (void *)&__static_call_return0);
+
+   if (perf_guest_cbs && perf_guest_cbs->state)
+   static_call_update(arm64_guest_state, perf_guest_cbs->state);
+
+   if (perf_guest_cbs && perf_guest_cbs->get_ip)
+   static_call_update(arm64_guest_get_ip, perf_guest_cbs->get_ip);
+}
+
 void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 struct pt_regs *regs)
 {
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+   if (static_call(arm64_guest_state)()) {
/* We don't support guest os callchain now */

Re: [PATCH V8 01/18] perf/core: Use static_call to optimize perf_guest_info_callbacks

2021-07-22 Thread Zhu, Lingshan



On 7/21/2021 7:57 PM, Like Xu wrote:

On 16/7/2021 4:53 pm, Zhu Lingshan wrote:

+    } else if (xenpmu_data->pmu.r.regs.cpl & 3)

oh, my typo, will fix in V9

Thanks


Lingshan, serious for this version ?

arch/x86/xen/pmu.c:438:9: error: expected identifier or ‘(’ before 
‘return’

  438 | return state;
  | ^~
arch/x86/xen/pmu.c:439:1: error: expected identifier or ‘(’ before ‘}’ 
token

  439 | }
  | ^
arch/x86/xen/pmu.c: In function ‘xen_guest_state’:
arch/x86/xen/pmu.c:436:9: error: control reaches end of non-void 
function [-Werror=return-type]

  436 | }
  | ^
cc1: some warnings being treated as errors


+    state |= PERF_GUEST_USER;
  }


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V8 01/18] perf/core: Use static_call to optimize perf_guest_info_callbacks

2021-07-22 Thread Zhu, Lingshan



On 7/21/2021 7:57 PM, Like Xu wrote:

On 16/7/2021 4:53 pm, Zhu Lingshan wrote:

+    } else if (xenpmu_data->pmu.r.regs.cpl & 3)


Lingshan, serious for this version ?

arch/x86/xen/pmu.c:438:9: error: expected identifier or ‘(’ before 
‘return’

  438 | return state;
  | ^~
arch/x86/xen/pmu.c:439:1: error: expected identifier or ‘(’ before ‘}’ 
token

  439 | }
  | ^
arch/x86/xen/pmu.c: In function ‘xen_guest_state’:
arch/x86/xen/pmu.c:436:9: error: control reaches end of non-void 
function [-Werror=return-type]

  436 | }
  | ^
cc1: some warnings being treated as errors


+    state |= PERF_GUEST_USER;
  }

forgot to enable XEN build in .config, V9 fixes this will come soon
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] doc/arm: take care restore order of GICR_* in ITS restore

2021-07-22 Thread Marc Zyngier
On Thu, 22 Jul 2021 03:49:52 +0100,
Jianyong Wu  wrote:
> 
> Hello Marc,
> 
> > -Original Message-
> > From: Marc Zyngier 
> > Sent: Wednesday, July 21, 2021 5:54 PM
> > To: Jianyong Wu 
> > Cc: James Morse ; Andre Przywara
> > ; lushenm...@huawei.com;
> > k...@vger.kernel.org; kvmarm@lists.cs.columbia.edu; linux-
> > d...@vger.kernel.org; linux-ker...@vger.kernel.org; Justin He
> > 
> > Subject: Re: [PATCH] doc/arm: take care restore order of GICR_* in ITS
> > restore
> >
> > On Wed, 21 Jul 2021 10:20:19 +0100,
> > Jianyong Wu  wrote:
> > >
> > > When restore GIC/ITS, GICR_CTLR must be restored after GICR_PROPBASER
> > > and GICR_PENDBASER. That is important, as both of GICR_PROPBASER and
> > > GICR_PENDBASER will fail to be loaded when lpi has enabled yet in
> > > GICR_CTLR. Keep the restore order above will avoid that issue.
> > > Shout it out at the doc is very helpful that may avoid lots of debug work.
> >
> > But that's something that is already mandated by the architecture, isn't it?
> > See "5.1 LPIs" in the architecture spec:
> >
> > 
> >
> > If GICR_PROPBASER is updated when GICR_CTLR.EnableLPIs == 1, the effects
> > are UNPREDICTABLE.
> >
> > [...]
> >
> > If GICR_PENDBASER is updated when GICR_CTLR.EnableLPIs == 1, the effects
> > are UNPREDICTABLE.
> >
> 
> I think this "UNPREDICTABLE" related with the "physical machine". Am
> I right?

No, you are unfortunately wrong. The architecture applies to *any*
implementation, and makes no distinction between a HW implementation
of a SW version. This is why we call it an architecture, and not an
implementation.

> In virtualization environment, kernel gives the definite answer that
> we should not enable GICR_CTLR.EnableLPIs before restoring
> GICR_PROPBASER(GICR_PENDBASER either) when restore GIC ITS in VMM,
> see [1]. Thus, should we consider the virtualization environment as
> a special case?

Absolutely not.  If you start special casing things, then we end-up
having stupidly designed SW that tries to do stupid things based on
the supposed properties of an implementation.

We follow the architecture, all the architecture, nothing but the
architecture. The architecture is the only barrier between insanity
and pure madness! ;-)

>
> [1] linux/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> static void vgic_mmio_write_propbase(struct kvm_vcpu *vcpu,
>  gpa_t addr, unsigned int len,
>  unsigned long val)
> {
> struct vgic_dist *dist = >kvm->arch.vgic;
> struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
> u64 old_propbaser, propbaser;
> 
> /* Storing a value with LPIs already enabled is undefined */
> if (vgic_cpu->lpis_enabled)
>return;
> ...
> }

Do you see how the kernel does exactly what the architecture says we
can do? Ignoring the write is a perfectly valid implementation of
UNPREDICTABLE.

So what we do is completely in line with the architecture. As such, no
need to document it any further, everything is already where it should
be. If someone tries to write code dealing with the GIC without
understanding the architecture, no amount of additional documentation
will help.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v2 3/3] kvm/arm: Align the VMID allocation with the arm64 ASID one

2021-07-22 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Will Deacon [mailto:w...@kernel.org]
> Sent: 21 July 2021 17:32
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-ker...@vger.kernel.org; m...@kernel.org; catalin.mari...@arm.com;
> james.mo...@arm.com; julien.thierry.k...@gmail.com;
> suzuki.poul...@arm.com; jean-phili...@linaro.org;
> alexandru.eli...@arm.com; Linuxarm ;
> qper...@google.com
> Subject: Re: [PATCH v2 3/3] kvm/arm: Align the VMID allocation with the
> arm64 ASID one
> 
> [+Quentin]
> 
> On Wed, Jun 16, 2021 at 04:56:06PM +0100, Shameer Kolothum wrote:
> > From: Julien Grall 
> >
> > At the moment, the VMID algorithm will send an SGI to all the CPUs to
> > force an exit and then broadcast a full TLB flush and I-Cache
> > invalidation.
> >
> > This patch use the new VMID allocator. The
> > benefits are:
> > - CPUs are not forced to exit at roll-over. Instead the VMID will be
> > marked reserved and the context will be flushed at next exit. This
> > will reduce the IPIs traffic.
> > - Context invalidation is now per-CPU rather than broadcasted.
> > - Catalin has a formal model of the ASID allocator.
> >
> > With the new algo, the code is now adapted:
> > - The function __kvm_flush_vm_context() has been renamed to
> > __kvm_tlb_flush_local_all() and now only flushing the current CPU
> > context.
> > - The call to update_vmid() will be done with preemption disabled
> > as the new algo requires to store information per-CPU.
> > - The TLBs associated to EL1 will be flushed when booting a CPU to
> > deal with stale information. This was previously done on the
> > allocation of the first VMID of a new generation.
> >
> > Signed-off-by: Julien Grall 
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> >  arch/arm64/include/asm/kvm_asm.h  |   4 +-
> >  arch/arm64/include/asm/kvm_host.h |   6 +-
> >  arch/arm64/include/asm/kvm_mmu.h  |   3 +-
> >  arch/arm64/kvm/Makefile   |   2 +-
> >  arch/arm64/kvm/arm.c  | 115 +++---
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c|   6 +-
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c |   3 +-
> >  arch/arm64/kvm/hyp/nvhe/tlb.c |  10 +--
> >  arch/arm64/kvm/hyp/vhe/tlb.c  |  10 +--
> >  arch/arm64/kvm/mmu.c  |   1 -
> >  10 files changed, 52 insertions(+), 108 deletions(-)
> 
> [...]
> 
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> b/arch/arm64/include/asm/kvm_host.h
> > index 75a7e8071012..d96284da8571 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -70,9 +70,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> >  void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
> >
> >  struct kvm_vmid {
> > -   /* The VMID generation used for the virt. memory system */
> > -   u64vmid_gen;
> > -   u32vmid;
> > +   atomic64_t id;
> 
> Maybe a typedef would be better if this is the only member of the structure?

Ok.

> 
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 4b60c0056c04..a02c4877a055 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -106,8 +106,7 @@ int kvm_host_prepare_stage2(void *mem_pgt_pool,
> void *dev_pgt_pool)
> > mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
> > mmu->arch = _kvm.arch;
> > mmu->pgt = _kvm.pgt;
> > -   mmu->vmid.vmid_gen = 0;
> > -   mmu->vmid.vmid = 0;
> > +   atomic64_set(>vmid.id, 0);
> 
> I think this is the first atomic64 use in the EL2 object, which may pull in
> some fatal KCSAN instrumentation. Quentin, have you run into this before?
> 
> Might be simple just to zero-initialise mmu for now, if it isn't already.

I will check that.

> 
> > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c
> b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > index 83dc3b271bc5..42df9931ed9a 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > @@ -140,10 +140,10 @@ void __kvm_flush_cpu_context(struct
> kvm_s2_mmu *mmu)
> > __tlb_switch_to_host();
> >  }
> >
> > -void __kvm_flush_vm_context(void)
> > +void __kvm_tlb_flush_local_all(void)
> >  {
> > -   dsb(ishst);
> > -   __tlbi(alle1is);
> > +   dsb(nshst);
> > +   __tlbi(alle1);
> >
> > /*
> >  * VIPT and PIPT caches are not affected by VMID, so no maintenance
> > @@ -155,7 +155,7 @@ void __kvm_flush_vm_context(void)
> >  *
> >  */
> > if (icache_is_vpipt())
> > -   asm volatile("ic ialluis");
> > +   asm volatile("ic iallu" : : );
> >
> > -   dsb(ish);
> > +   dsb(nsh);
> 
> Hmm, I'm wondering whether having this local stuff really makes sense for
> VMIDs. For ASIDs, where rollover can be frequent and TLBI could result in
> IPI on 32-bit, the local option was important, but here rollover is less
> frequent, DVM is relied upon to work and the cost of a hypercall is
> 

RE: [PATCH v2 2/3] kvm/arm: Introduce a new vmid allocator for KVM

2021-07-22 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Will Deacon [mailto:w...@kernel.org]
> Sent: 21 July 2021 17:06
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-ker...@vger.kernel.org; m...@kernel.org; catalin.mari...@arm.com;
> james.mo...@arm.com; julien.thierry.k...@gmail.com;
> suzuki.poul...@arm.com; jean-phili...@linaro.org;
> alexandru.eli...@arm.com; Linuxarm 
> Subject: Re: [PATCH v2 2/3] kvm/arm: Introduce a new vmid allocator for KVM
> 
> On Wed, Jun 16, 2021 at 04:56:05PM +0100, Shameer Kolothum wrote:
> > A new VMID allocator for arm64 KVM use. This is based on
> > arm64 asid allocator algorithm.
> >
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> >  arch/arm64/include/asm/kvm_host.h |   4 +
> >  arch/arm64/kvm/vmid.c | 206
> ++
> >  2 files changed, 210 insertions(+)
> >  create mode 100644 arch/arm64/kvm/vmid.c
> 
> Generally, I prefer this to the alternative of creating a library. However,
> I'd probably remove all the duplicated comments in favour of a reference
> to the ASID allocator. That way, we can just comment any VMID-specific
> behaviour in here.

Agree. I retained the comments mainly for myself as its very difficult at times
to follow :)

> 
> Some comments below...
> 
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> b/arch/arm64/include/asm/kvm_host.h
> > index 7cd7d5c8c4bc..75a7e8071012 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -680,6 +680,10 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu
> *vcpu,
> >  int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
> > struct kvm_device_attr *attr);
> >
> > +int kvm_arm_vmid_alloc_init(void);
> > +void kvm_arm_vmid_alloc_free(void);
> > +void kvm_arm_update_vmid(atomic64_t *id);
> > +
> >  static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch
> *vcpu_arch)
> >  {
> > vcpu_arch->steal.base = GPA_INVALID;
> > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> > new file mode 100644
> > index ..687e18d33130
> > --- /dev/null
> > +++ b/arch/arm64/kvm/vmid.c
> > @@ -0,0 +1,206 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * VMID allocator.
> > + *
> > + * Based on arch/arm64/mm/context.c
> > + *
> > + * Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
> > + * Copyright (C) 2012 ARM Ltd.
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +
> > +static u32 vmid_bits;
> > +static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);
> > +
> > +static atomic64_t vmid_generation;
> > +static unsigned long *vmid_map;
> > +
> > +static DEFINE_PER_CPU(atomic64_t, active_vmids);
> > +static DEFINE_PER_CPU(u64, reserved_vmids);
> > +static cpumask_t tlb_flush_pending;
> > +
> > +#define VMID_MASK  (~GENMASK(vmid_bits - 1, 0))
> > +#define VMID_FIRST_VERSION (1UL << vmid_bits)
> > +
> > +#define NUM_USER_VMIDS VMID_FIRST_VERSION
> > +#define vmid2idx(vmid) ((vmid) & ~VMID_MASK)
> > +#define idx2vmid(idx)  vmid2idx(idx)
> > +
> > +#define vmid_gen_match(vmid) \
> > +   (!(((vmid) ^ atomic64_read(_generation)) >> vmid_bits))
> > +
> > +static void flush_context(void)
> > +{
> > +   int cpu;
> > +   u64 vmid;
> > +
> > +   bitmap_clear(vmid_map, 0, NUM_USER_VMIDS);
> > +
> > +   for_each_possible_cpu(cpu) {
> > +   vmid = atomic64_xchg_relaxed(_cpu(active_vmids, cpu), 0);
> > +   /*
> > +* If this CPU has already been through a
> > +* rollover, but hasn't run another task in
> > +* the meantime, we must preserve its reserved
> > +* VMID, as this is the only trace we have of
> > +* the process it is still running.
> > +*/
> > +   if (vmid == 0)
> > +   vmid = per_cpu(reserved_vmids, cpu);
> > +   __set_bit(vmid2idx(vmid), vmid_map);
> > +   per_cpu(reserved_vmids, cpu) = vmid;
> > +   }
> 
> Hmm, so here we're copying the active_vmids into the reserved_vmids on a
> rollover, but I wonder if that's overly pessismistic? For the ASID
> allocator, every CPU tends to have a current task so it makes sense, but
> I'm not sure it's necessarily the case that every CPU tends to have a
> vCPU as the current task. For example, imagine you have a nasty 128-CPU
> system with 8-bit VMIDs and each CPU has at some point run a vCPU. Then,
> on rollover, we'll immediately reserve half of the VMID space, even if
> those vCPUs don't even exist any more.
> 
> Not sure if it's worth worrying about, but I wanted to mention it.

Ok. I see your suggestion in patch #3 to avoid this.

> 
> > +void kvm_arm_update_vmid(atomic64_t *id)
> > +{
> 
> Take the kvm_vmid here? That would make:
> 
> > +   /* Check that our VMID belongs to the current generation. */
> > +   vmid = atomic64_read(id);
> > +   if (!vmid_gen_match(vmid)) {
> > +   vmid = 

RE: [PATCH v2 1/3] arch/arm64: Introduce a capability to tell whether 16-bit VMID is available

2021-07-22 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Will Deacon [mailto:w...@kernel.org]
> Sent: 21 July 2021 16:23
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-ker...@vger.kernel.org; m...@kernel.org; catalin.mari...@arm.com;
> james.mo...@arm.com; julien.thierry.k...@gmail.com;
> suzuki.poul...@arm.com; jean-phili...@linaro.org;
> alexandru.eli...@arm.com; Linuxarm 
> Subject: Re: [PATCH v2 1/3] arch/arm64: Introduce a capability to tell whether
> 16-bit VMID is available
> 
> On Wed, Jun 16, 2021 at 04:56:04PM +0100, Shameer Kolothum wrote:
> > From: Julien Grall 
> >
> > At the moment, the function kvm_get_vmid_bits() is looking up for the
> > sanitized value of ID_AA64MMFR1_EL1 and extract the information
> > regarding the number of VMID bits supported.
> >
> > This is fine as the function is mainly used during VMID roll-over. New
> > use in a follow-up patch will require the function to be called a every
> > context switch so we want the function to be more efficient.
> >
> > A new capability is introduced to tell whether 16-bit VMID is
> > available.
> 
> I don't really buy this rationale. The VMID allocator introduced later on
> caches this value in the static 'vmid_bits' variable, and that gets used
> on vCPU enter via vmid_gen_match() in the kvm_arm_update_vmid() fastpath.
> 
> So I would prefer that we just expose an accessor for that than introduce
> a static key and new cpufeature just for kvm_get_vttbr().

Ok. Will change it to an accessor.

Thanks,
Shameer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm