Re: [PATCH 1/5] vfio/iommu_type1: Fixes vfio_dma_populate_bitmap to avoid dirty lose
On Thu, 7 Jan 2021 17:28:57 +0800 Keqian Zhu wrote: > Defer checking whether vfio_dma is of fully-dirty in update_user_bitmap > is easy to lose dirty log. For example, after promoting pinned_scope of > vfio_iommu, vfio_dma is not considered as fully-dirty, then we may lose > dirty log that occurs before vfio_iommu is promoted. > > The key point is that pinned-dirty is not a real dirty tracking way, it > can't continuously track dirty pages, but just restrict dirty scope. It > is essentially the same as fully-dirty. Fully-dirty is of full-scope and > pinned-dirty is of pinned-scope. > > So we must mark pinned-dirty or fully-dirty after we start dirty tracking > or clear dirty bitmap, to ensure that dirty log is marked right away. I was initially convinced by these first three patches, but upon further review, I think the premise is wrong. AIUI, the concern across these patches is that our dirty bitmap is only populated with pages dirtied by pinning and we only take into account the pinned page dirty scope at the time the bitmap is retrieved by the user. You suppose this presents a gap where if a vendor driver has not yet identified with a page pinning scope that the entire bitmap should be considered dirty regardless of whether that driver later pins pages prior to the user retrieving the dirty bitmap. I don't think this is how we intended the cooperation between the iommu driver and vendor driver to work. By pinning pages a vendor driver is not declaring that only their future dirty page scope is limited to pinned pages, instead they're declaring themselves as a participant in dirty page tracking and take responsibility for pinning any necessary pages. For example we might extend VFIO_IOMMU_DIRTY_PAGES_FLAG_START to trigger a blocking notification to groups to not only begin dirty tracking, but also to synchronously register their current device DMA footprint. This patch would require a vendor driver to possibly perform a gratuitous page pinning in order to set the scope prior to dirty logging being enabled, or else the initial bitmap will be fully dirty. Therefore, I don't see that this series is necessary or correct. Kirti, does this match your thinking? Thinking about these semantics, it seems there might still be an issue if a group with non-pinned-page dirty scope is detached with dirty logging enabled. It seems this should in fact fully populate the dirty bitmaps at the time it's removed since we don't know the extent of its previous DMA, nor will the group be present to trigger the full bitmap when the user retrieves the dirty bitmap. Creating fully populated bitmaps at the time tracking is enabled negates our ability to take advantage of later enlightenment though. Thanks, Alex > Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages > tracking") > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 33 ++--- > 1 file changed, 22 insertions(+), 11 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index bceda5e8baaa..b0a26e8e0adf 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -224,7 +224,7 @@ static void vfio_dma_bitmap_free(struct vfio_dma *dma) > dma->bitmap = NULL; > } > > -static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize) > +static void vfio_dma_populate_bitmap_pinned(struct vfio_dma *dma, size_t > pgsize) > { > struct rb_node *p; > unsigned long pgshift = __ffs(pgsize); > @@ -236,6 +236,25 @@ static void vfio_dma_populate_bitmap(struct vfio_dma > *dma, size_t pgsize) > } > } > > +static void vfio_dma_populate_bitmap_full(struct vfio_dma *dma, size_t > pgsize) > +{ > + unsigned long pgshift = __ffs(pgsize); > + unsigned long nbits = dma->size >> pgshift; > + > + bitmap_set(dma->bitmap, 0, nbits); > +} > + > +static void vfio_dma_populate_bitmap(struct vfio_iommu *iommu, > + struct vfio_dma *dma) > +{ > + size_t pgsize = (size_t)1 << __ffs(iommu->pgsize_bitmap); > + > + if (iommu->pinned_page_dirty_scope) > + vfio_dma_populate_bitmap_pinned(dma, pgsize); > + else if (dma->iommu_mapped) > + vfio_dma_populate_bitmap_full(dma, pgsize); > +} > + > static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu) > { > struct rb_node *n; > @@ -257,7 +276,7 @@ static int vfio_dma_bitmap_alloc_all(struct vfio_iommu > *iommu) > } > return ret; > } > - vfio_dma_populate_bitmap(dma, pgsize); > + vfio_dma_populate_bitmap(iommu, dma); > } > return 0; > } > @@ -987,13 +1006,6 @@ static int update_user_bitmap(u64 __user *bitmap, > struct vfio_iommu *iommu, > unsigned long shift = bit_offset % BITS_PER_LONG; > unsigned long leftover; > > - /* > - * mark all pages dirty if any I
Re: [RFC PATCH v2 15/26] of/fdt: Introduce early_init_dt_add_memory_hyp()
On Tuesday 12 Jan 2021 at 10:45:56 (-0600), Rob Herring wrote: > Umm, yes you are right. But both are dealing with nomap. So someone > needs to sort out what the right thing to do here is. No one cared > enough to follow up in a year and a half. Fair enough, happy to do that. I'll send a small series with these two patches independently from this series which may take a while to land. Thanks, Quentin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v2 15/26] of/fdt: Introduce early_init_dt_add_memory_hyp()
On Tuesday 12 Jan 2021 at 09:53:36 (-0600), Rob Herring wrote: > On Tue, Jan 12, 2021 at 8:26 AM Quentin Perret wrote: > > > > On Tuesday 12 Jan 2021 at 08:10:47 (-0600), Rob Herring wrote: > > > On Tue, Jan 12, 2021 at 3:51 AM Quentin Perret wrote: > > > > > > > > On Monday 11 Jan 2021 at 08:45:10 (-0600), Rob Herring wrote: > > > > > On Fri, Jan 8, 2021 at 6:16 AM Quentin Perret > > > > > wrote: > > > > > > > > > > > > Introduce early_init_dt_add_memory_hyp() to allow KVM to conserve a > > > > > > copy > > > > > > of the memory regions parsed from DT. This will be needed in the > > > > > > context > > > > > > of the protected nVHE feature of KVM/arm64 where the code running > > > > > > at EL2 > > > > > > will be cleanly separated from the host kernel during boot, and will > > > > > > need its own representation of memory. > > > > > > > > > > What happened to doing this with memblock? > > > > > > > > I gave it a go, but as mentioned in v1, I ran into issues for nomap > > > > regions. I want the hypervisor to know about these memory regions (it's > > > > possible some of those will be given to protected guests for instance) > > > > but these seem to be entirely removed from the memblocks when using DT: > > > > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L1153 > > > > > > > > EFI appears to do things differently, though, as it 'just' uses > > > > memblock_mark_nomap() instead of actively removing the memblock. And > > > > that > > > > means I could actually use the memblock API for EFI, but I'd rather > > > > have a common solution. I tried to understand why things are done > > > > differently but couldn't find an answer and kept things simple and > > > > working for now. > > > > > > > > Is there a good reason for not using memblock_mark_nomap() with DT? If > > > > not, I'm happy to try that. > > > > > > There were 2 patches to do that, but it never got resolved. See here[1]. > > > > Thanks. So the DT stuff predates the introduction of memblock_mark_nomap, > > that's why... > > > > By reading the discussions, [1] still looks a sensible patch on its own, > > independently from the issue Nicolas tried to solve. Any reason for not > > applying it? > > As I mentioned in the thread, same patch with 2 different reasons. So > I just wanted a better commit message covering both. Sorry if I'm being thick, but I'm not seeing it. How are they the same? IIUC, as per Nicolas' last reply, using memblock_mark_nomap() does not solve his issue with a broken DT. These 2 patches address two completely separate issues no? Thanks, Quentin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 4/5] vfio/iommu_type1: Carefully use unmap_unpin_all during dirty tracking
On Tue, 12 Jan 2021 20:04:38 +0800 Keqian Zhu wrote: > On 2021/1/12 5:49, Alex Williamson wrote: > > On Thu, 7 Jan 2021 17:29:00 +0800 > > Keqian Zhu wrote: > > > >> If we detach group during dirty page tracking, we shouldn't remove > >> vfio_dma, because dirty log will lose. > >> > >> But we don't prevent unmap_unpin_all in vfio_iommu_release, because > >> under normal procedure, dirty tracking has been stopped. > > > > This looks like it's creating a larger problem than it's fixing, it's > > not our job to maintain the dirty bitmap regardless of what the user > > does. If the user detaches the last group in a container causing the > > mappings within that container to be deconstructed before the user has > > collected dirty pages, that sounds like a user error. A container with > > no groups is de-privileged and therefore loses all state. Thanks, > > > > Alex > > Hi Alex, > > This looks good to me ;-). That's a reasonable constraint for user behavior. > > What about replacing this patch with an addition to the uapi document of > VFIO_GROUP_UNSET_CONTAINER? User should pay attention to this when call this > ioctl during dirty tracking. Here's the current uapi comment: /** * VFIO_GROUP_UNSET_CONTAINER - _IO(VFIO_TYPE, VFIO_BASE + 5) * * Remove the group from the attached container. This is the * opposite of the SET_CONTAINER call and returns the group to * an initial state. All device file descriptors must be released * prior to calling this interface. When removing the last group * from a container, the IOMMU will be disabled and all state lost, * effectively also returning the VFIO file descriptor to an initial * state. * Return: 0 on success, -errno on failure. * Availability: When attached to container */ So we already indicate that "all state" of the container is lost when removing the last group, I don't see that it's necessarily to explicitly include dirty bitmap state beyond that statement. Without mappings there can be no dirty bitmap to track. > And any comments on other patches? thanks. I had a difficult time mapping the commit log to the actual code change, I'll likely have some wording suggestions. Is patch 5/5 still necessary if this patch is dropped? Thanks, Alex > >> Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages > >> tracking") > >> Signed-off-by: Keqian Zhu > >> --- > >> drivers/vfio/vfio_iommu_type1.c | 14 -- > >> 1 file changed, 12 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/vfio/vfio_iommu_type1.c > >> b/drivers/vfio/vfio_iommu_type1.c > >> index 26b7eb2a5cfc..9776a059904d 100644 > >> --- a/drivers/vfio/vfio_iommu_type1.c > >> +++ b/drivers/vfio/vfio_iommu_type1.c > >> @@ -2373,7 +2373,12 @@ static void vfio_iommu_type1_detach_group(void > >> *iommu_data, > >>if (list_empty(&iommu->external_domain->group_list)) { > >>vfio_sanity_check_pfn_list(iommu); > >> > >> - if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) > >> + /* > >> + * During dirty page tracking, we can't remove > >> + * vfio_dma because dirty log will lose. > >> + */ > >> + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) && > >> + !iommu->dirty_page_tracking) > >>vfio_iommu_unmap_unpin_all(iommu); > >> > >>kfree(iommu->external_domain); > >> @@ -2406,10 +2411,15 @@ static void vfio_iommu_type1_detach_group(void > >> *iommu_data, > >> * iommu and external domain doesn't exist, then all the > >> * mappings go away too. If it's the last domain with iommu and > >> * external domain exist, update accounting > >> + * > >> + * Note: During dirty page tracking, we can't remove vfio_dma > >> + * because dirty log will lose. Just update accounting is a good > >> + * choice. > >> */ > >>if (list_empty(&domain->group_list)) { > >>if (list_is_singular(&iommu->domain_list)) { > >> - if (!iommu->external_domain) > >> + if (!iommu->external_domain && > >> + !iommu->dirty_page_tracking) > >>vfio_iommu_unmap_unpin_all(iommu); > >>else > >> > >> vfio_iommu_unmap_unpin_reaccount(iommu); > > > > . > > > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v2 15/26] of/fdt: Introduce early_init_dt_add_memory_hyp()
On Tue, Jan 12, 2021 at 10:15 AM Quentin Perret wrote: > > On Tuesday 12 Jan 2021 at 09:53:36 (-0600), Rob Herring wrote: > > On Tue, Jan 12, 2021 at 8:26 AM Quentin Perret wrote: > > > > > > On Tuesday 12 Jan 2021 at 08:10:47 (-0600), Rob Herring wrote: > > > > On Tue, Jan 12, 2021 at 3:51 AM Quentin Perret > > > > wrote: > > > > > > > > > > On Monday 11 Jan 2021 at 08:45:10 (-0600), Rob Herring wrote: > > > > > > On Fri, Jan 8, 2021 at 6:16 AM Quentin Perret > > > > > > wrote: > > > > > > > > > > > > > > Introduce early_init_dt_add_memory_hyp() to allow KVM to conserve > > > > > > > a copy > > > > > > > of the memory regions parsed from DT. This will be needed in the > > > > > > > context > > > > > > > of the protected nVHE feature of KVM/arm64 where the code running > > > > > > > at EL2 > > > > > > > will be cleanly separated from the host kernel during boot, and > > > > > > > will > > > > > > > need its own representation of memory. > > > > > > > > > > > > What happened to doing this with memblock? > > > > > > > > > > I gave it a go, but as mentioned in v1, I ran into issues for nomap > > > > > regions. I want the hypervisor to know about these memory regions > > > > > (it's > > > > > possible some of those will be given to protected guests for instance) > > > > > but these seem to be entirely removed from the memblocks when using > > > > > DT: > > > > > > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L1153 > > > > > > > > > > EFI appears to do things differently, though, as it 'just' uses > > > > > memblock_mark_nomap() instead of actively removing the memblock. And > > > > > that > > > > > means I could actually use the memblock API for EFI, but I'd rather > > > > > have a common solution. I tried to understand why things are done > > > > > differently but couldn't find an answer and kept things simple and > > > > > working for now. > > > > > > > > > > Is there a good reason for not using memblock_mark_nomap() with DT? If > > > > > not, I'm happy to try that. > > > > > > > > There were 2 patches to do that, but it never got resolved. See here[1]. > > > > > > Thanks. So the DT stuff predates the introduction of memblock_mark_nomap, > > > that's why... > > > > > > By reading the discussions, [1] still looks a sensible patch on its own, > > > independently from the issue Nicolas tried to solve. Any reason for not > > > applying it? > > > > As I mentioned in the thread, same patch with 2 different reasons. So > > I just wanted a better commit message covering both. > > Sorry if I'm being thick, but I'm not seeing it. How are they the same? > IIUC, as per Nicolas' last reply, using memblock_mark_nomap() does not > solve his issue with a broken DT. These 2 patches address two completely > separate issues no? Umm, yes you are right. But both are dealing with nomap. So someone needs to sort out what the right thing to do here is. No one cared enough to follow up in a year and a half. Rob ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v2 15/26] of/fdt: Introduce early_init_dt_add_memory_hyp()
On Tue, Jan 12, 2021 at 8:26 AM Quentin Perret wrote: > > On Tuesday 12 Jan 2021 at 08:10:47 (-0600), Rob Herring wrote: > > On Tue, Jan 12, 2021 at 3:51 AM Quentin Perret wrote: > > > > > > On Monday 11 Jan 2021 at 08:45:10 (-0600), Rob Herring wrote: > > > > On Fri, Jan 8, 2021 at 6:16 AM Quentin Perret > > > > wrote: > > > > > > > > > > Introduce early_init_dt_add_memory_hyp() to allow KVM to conserve a > > > > > copy > > > > > of the memory regions parsed from DT. This will be needed in the > > > > > context > > > > > of the protected nVHE feature of KVM/arm64 where the code running at > > > > > EL2 > > > > > will be cleanly separated from the host kernel during boot, and will > > > > > need its own representation of memory. > > > > > > > > What happened to doing this with memblock? > > > > > > I gave it a go, but as mentioned in v1, I ran into issues for nomap > > > regions. I want the hypervisor to know about these memory regions (it's > > > possible some of those will be given to protected guests for instance) > > > but these seem to be entirely removed from the memblocks when using DT: > > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L1153 > > > > > > EFI appears to do things differently, though, as it 'just' uses > > > memblock_mark_nomap() instead of actively removing the memblock. And that > > > means I could actually use the memblock API for EFI, but I'd rather > > > have a common solution. I tried to understand why things are done > > > differently but couldn't find an answer and kept things simple and > > > working for now. > > > > > > Is there a good reason for not using memblock_mark_nomap() with DT? If > > > not, I'm happy to try that. > > > > There were 2 patches to do that, but it never got resolved. See here[1]. > > Thanks. So the DT stuff predates the introduction of memblock_mark_nomap, > that's why... > > By reading the discussions, [1] still looks a sensible patch on its own, > independently from the issue Nicolas tried to solve. Any reason for not > applying it? As I mentioned in the thread, same patch with 2 different reasons. So I just wanted a better commit message covering both. Rob ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v2 15/26] of/fdt: Introduce early_init_dt_add_memory_hyp()
On Tue, Jan 12, 2021 at 3:51 AM Quentin Perret wrote: > > On Monday 11 Jan 2021 at 08:45:10 (-0600), Rob Herring wrote: > > On Fri, Jan 8, 2021 at 6:16 AM Quentin Perret wrote: > > > > > > Introduce early_init_dt_add_memory_hyp() to allow KVM to conserve a copy > > > of the memory regions parsed from DT. This will be needed in the context > > > of the protected nVHE feature of KVM/arm64 where the code running at EL2 > > > will be cleanly separated from the host kernel during boot, and will > > > need its own representation of memory. > > > > What happened to doing this with memblock? > > I gave it a go, but as mentioned in v1, I ran into issues for nomap > regions. I want the hypervisor to know about these memory regions (it's > possible some of those will be given to protected guests for instance) > but these seem to be entirely removed from the memblocks when using DT: > > https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L1153 > > EFI appears to do things differently, though, as it 'just' uses > memblock_mark_nomap() instead of actively removing the memblock. And that > means I could actually use the memblock API for EFI, but I'd rather > have a common solution. I tried to understand why things are done > differently but couldn't find an answer and kept things simple and > working for now. > > Is there a good reason for not using memblock_mark_nomap() with DT? If > not, I'm happy to try that. There were 2 patches to do that, but it never got resolved. See here[1]. Rob [1] https://lore.kernel.org/linux-devicetree/?q=s%3Ano-map ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v2 15/26] of/fdt: Introduce early_init_dt_add_memory_hyp()
On Tuesday 12 Jan 2021 at 08:10:47 (-0600), Rob Herring wrote: > On Tue, Jan 12, 2021 at 3:51 AM Quentin Perret wrote: > > > > On Monday 11 Jan 2021 at 08:45:10 (-0600), Rob Herring wrote: > > > On Fri, Jan 8, 2021 at 6:16 AM Quentin Perret wrote: > > > > > > > > Introduce early_init_dt_add_memory_hyp() to allow KVM to conserve a copy > > > > of the memory regions parsed from DT. This will be needed in the context > > > > of the protected nVHE feature of KVM/arm64 where the code running at EL2 > > > > will be cleanly separated from the host kernel during boot, and will > > > > need its own representation of memory. > > > > > > What happened to doing this with memblock? > > > > I gave it a go, but as mentioned in v1, I ran into issues for nomap > > regions. I want the hypervisor to know about these memory regions (it's > > possible some of those will be given to protected guests for instance) > > but these seem to be entirely removed from the memblocks when using DT: > > > > https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L1153 > > > > EFI appears to do things differently, though, as it 'just' uses > > memblock_mark_nomap() instead of actively removing the memblock. And that > > means I could actually use the memblock API for EFI, but I'd rather > > have a common solution. I tried to understand why things are done > > differently but couldn't find an answer and kept things simple and > > working for now. > > > > Is there a good reason for not using memblock_mark_nomap() with DT? If > > not, I'm happy to try that. > > There were 2 patches to do that, but it never got resolved. See here[1]. Thanks. So the DT stuff predates the introduction of memblock_mark_nomap, that's why... By reading the discussions, [1] still looks a sensible patch on its own, independently from the issue Nicolas tried to solve. Any reason for not applying it? I'll try to rework my series on top and see how that goes. Thanks, Quentin [1] https://lore.kernel.org/linux-devicetree/1562920284-18638-1-git-send-email-karah...@amazon.de/ ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 8/9] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace
On 2021-01-12 17:28, Alexandru Elisei wrote: Hi Eric, On 12/12/20 6:50 PM, Eric Auger wrote: Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace") temporarily fixed a bug identified when attempting to access the GICR_TYPER register before the redistributor region setting but dropped the support of the LAST bit. This patch restores its support (if the redistributor region was set) while keeping the code safe. If I understand your patch correctly, it is possible for the GICR_TYPER.Last bit to be transiently 1 if the register is accessed before all the redistributors regions have been configured. Arm IHI 0069F states that accesses to the GICR_TYPER register are RO. I haven't found exactly what RO means (please point me to the definition if you find it in the architecture!), but I assume it means read-only and I'm not sure how correct (from an architectural point of view) it is for two subsequent reads of this register to return different values. Maybe Marc can shed some light on this. RO = Read-Only indeed. Not sure that's documented anywhere in the architecture, but this is enough of a well known acronym that even the ARM ARM doesn't feel the need to invent a new definition for it. As for your concern, I don't think it is a problem to return different values if the HW has changed in between. This may need to be documented though. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 8/9] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace
Hi Eric, On 12/12/20 6:50 PM, Eric Auger wrote: > Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the > reporting of GICR_TYPER.Last for userspace") temporarily fixed > a bug identified when attempting to access the GICR_TYPER > register before the redistributor region setting but dropped > the support of the LAST bit. This patch restores its > support (if the redistributor region was set) while keeping the > code safe. If I understand your patch correctly, it is possible for the GICR_TYPER.Last bit to be transiently 1 if the register is accessed before all the redistributors regions have been configured. Arm IHI 0069F states that accesses to the GICR_TYPER register are RO. I haven't found exactly what RO means (please point me to the definition if you find it in the architecture!), but I assume it means read-only and I'm not sure how correct (from an architectural point of view) it is for two subsequent reads of this register to return different values. Maybe Marc can shed some light on this. Thanks, Alex > > Signed-off-by: Eric Auger > --- > arch/arm64/kvm/vgic/vgic-mmio-v3.c | 7 ++- > include/kvm/arm_vgic.h | 1 + > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c > b/arch/arm64/kvm/vgic/vgic-mmio-v3.c > index 581f0f49..2f9ef6058f6e 100644 > --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c > +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c > @@ -277,6 +277,8 @@ static unsigned long vgic_uaccess_read_v3r_typer(struct > kvm_vcpu *vcpu, >gpa_t addr, unsigned int len) > { > unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu); > + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; > + struct vgic_redist_region *rdreg = vgic_cpu->rdreg; > int target_vcpu_id = vcpu->vcpu_id; > u64 value; > > @@ -286,7 +288,9 @@ static unsigned long vgic_uaccess_read_v3r_typer(struct > kvm_vcpu *vcpu, > if (vgic_has_its(vcpu->kvm)) > value |= GICR_TYPER_PLPIS; > > - /* reporting of the Last bit is not supported for userspace */ > + if (rdreg && (vgic_cpu->rdreg_index == (rdreg->free_index - 1))) > + value |= GICR_TYPER_LAST; > + > return extract_bytes(value, addr & 7, len); > } > > @@ -714,6 +718,7 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu) > return -EINVAL; > > vgic_cpu->rdreg = rdreg; > + vgic_cpu->rdreg_index = rdreg->free_index; > > rd_base = rdreg->base + rdreg->free_index * KVM_VGIC_V3_REDIST_SIZE; > > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h > index a8d8fdcd3723..596c069263a7 100644 > --- a/include/kvm/arm_vgic.h > +++ b/include/kvm/arm_vgic.h > @@ -322,6 +322,7 @@ struct vgic_cpu { >*/ > struct vgic_io_device rd_iodev; > struct vgic_redist_region *rdreg; > + u32 rdreg_index; > > /* Contains the attributes and gpa of the LPI pending tables. */ > u64 pendbaser; ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 8/9] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace
Hi Eric, On 12/12/20 6:50 PM, Eric Auger wrote: > Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the > reporting of GICR_TYPER.Last for userspace") temporarily fixed > a bug identified when attempting to access the GICR_TYPER > register before the redistributor region setting but dropped > the support of the LAST bit. This patch restores its > support (if the redistributor region was set) while keeping the > code safe. I suppose the reason for emulating GICR_TYPER.Last is for architecture compliance, right? I think that should be in the commit message. > > Signed-off-by: Eric Auger > --- > arch/arm64/kvm/vgic/vgic-mmio-v3.c | 7 ++- > include/kvm/arm_vgic.h | 1 + > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c > b/arch/arm64/kvm/vgic/vgic-mmio-v3.c > index 581f0f49..2f9ef6058f6e 100644 > --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c > +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c > @@ -277,6 +277,8 @@ static unsigned long vgic_uaccess_read_v3r_typer(struct > kvm_vcpu *vcpu, >gpa_t addr, unsigned int len) > { > unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu); > + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; > + struct vgic_redist_region *rdreg = vgic_cpu->rdreg; > int target_vcpu_id = vcpu->vcpu_id; > u64 value; > > @@ -286,7 +288,9 @@ static unsigned long vgic_uaccess_read_v3r_typer(struct > kvm_vcpu *vcpu, > if (vgic_has_its(vcpu->kvm)) > value |= GICR_TYPER_PLPIS; > > - /* reporting of the Last bit is not supported for userspace */ > + if (rdreg && (vgic_cpu->rdreg_index == (rdreg->free_index - 1))) > + value |= GICR_TYPER_LAST; > + > return extract_bytes(value, addr & 7, len); > } > > @@ -714,6 +718,7 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu) > return -EINVAL; > > vgic_cpu->rdreg = rdreg; > + vgic_cpu->rdreg_index = rdreg->free_index; What happens if the next redistributor region we register has the base address adjacent to this one? I'm really not familiar with the code, but is it not possible to create two Redistributor regions (via KVM_DEV_ARM_VGIC_GRP_ADDR(KVM_VGIC_V3_ADDR_TYPE_REDIST)) where the second Redistributor region start address is immediately after the last Redistributor in the preceding region? Thanks, Alex > > rd_base = rdreg->base + rdreg->free_index * KVM_VGIC_V3_REDIST_SIZE; > > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h > index a8d8fdcd3723..596c069263a7 100644 > --- a/include/kvm/arm_vgic.h > +++ b/include/kvm/arm_vgic.h > @@ -322,6 +322,7 @@ struct vgic_cpu { >*/ > struct vgic_io_device rd_iodev; > struct vgic_redist_region *rdreg; > + u32 rdreg_index; > > /* Contains the attributes and gpa of the LPI pending tables. */ > u64 pendbaser; ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/9] KVM: arm64: Simplify argument passing to vgic_uaccess_[read|write]
Hi Eric, On 1/12/21 4:04 PM, Alexandru Elisei wrote: > Hi Eric, > > On 12/12/20 6:50 PM, Eric Auger wrote: >> Instead of converting the vgic_io_device handle to a kvm_io_device >> handled and then do the oppositive, pass a vgic_io_device pointer all >> along the call chain. > To me, it looks like the commit message describes what the patch does instead > of > why it does it. > > What are "vgic_io_device handle" and "kvm_io_device handled"? Sorry, I think I got it now. You were referring to the argument types struct vgic_io_device and struct kvm_io_device. The patch looks like a very good cleanup. How changing to commit message to sound something like this (feel free to ignore/change it if you think of something else): vgic_uaccess() takes a struct vgic_io_device argument, converts it to a struct kvm_io_device and passes it to the read/write accessor functions, which convert it back to a struct vgic_io_device. Avoid the indirection by passing the struct vgic_io_device argument directly to vgic_uaccess_{read,write). Thanks, Alex > > Thanks, > Alex >> Signed-off-by: Eric Auger >> --- >> arch/arm64/kvm/vgic/vgic-mmio.c | 10 -- >> 1 file changed, 4 insertions(+), 6 deletions(-) >> >> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c >> b/arch/arm64/kvm/vgic/vgic-mmio.c >> index b2d73fc0d1ef..48c6067fc5ec 100644 >> --- a/arch/arm64/kvm/vgic/vgic-mmio.c >> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c >> @@ -938,10 +938,9 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct >> vgic_io_device *iodev, >> return region; >> } >> >> -static int vgic_uaccess_read(struct kvm_vcpu *vcpu, struct kvm_io_device >> *dev, >> +static int vgic_uaccess_read(struct kvm_vcpu *vcpu, struct vgic_io_device >> *iodev, >> gpa_t addr, u32 *val) >> { >> -struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev); >> const struct vgic_register_region *region; >> struct kvm_vcpu *r_vcpu; >> >> @@ -960,10 +959,9 @@ static int vgic_uaccess_read(struct kvm_vcpu *vcpu, >> struct kvm_io_device *dev, >> return 0; >> } >> >> -static int vgic_uaccess_write(struct kvm_vcpu *vcpu, struct kvm_io_device >> *dev, >> +static int vgic_uaccess_write(struct kvm_vcpu *vcpu, struct vgic_io_device >> *iodev, >>gpa_t addr, const u32 *val) >> { >> -struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev); >> const struct vgic_register_region *region; >> struct kvm_vcpu *r_vcpu; >> >> @@ -986,9 +984,9 @@ int vgic_uaccess(struct kvm_vcpu *vcpu, struct >> vgic_io_device *dev, >> bool is_write, int offset, u32 *val) >> { >> if (is_write) >> -return vgic_uaccess_write(vcpu, &dev->dev, offset, val); >> +return vgic_uaccess_write(vcpu, dev, offset, val); >> else >> -return vgic_uaccess_read(vcpu, &dev->dev, offset, val); >> +return vgic_uaccess_read(vcpu, dev, offset, val); >> } >> >> static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device >> *dev, > ___ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/9] KVM: arm64: Simplify argument passing to vgic_uaccess_[read|write]
Hi Eric, On 12/12/20 6:50 PM, Eric Auger wrote: > Instead of converting the vgic_io_device handle to a kvm_io_device > handled and then do the oppositive, pass a vgic_io_device pointer all > along the call chain. To me, it looks like the commit message describes what the patch does instead of why it does it. What are "vgic_io_device handle" and "kvm_io_device handled"? Thanks, Alex > > Signed-off-by: Eric Auger > --- > arch/arm64/kvm/vgic/vgic-mmio.c | 10 -- > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c > index b2d73fc0d1ef..48c6067fc5ec 100644 > --- a/arch/arm64/kvm/vgic/vgic-mmio.c > +++ b/arch/arm64/kvm/vgic/vgic-mmio.c > @@ -938,10 +938,9 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct > vgic_io_device *iodev, > return region; > } > > -static int vgic_uaccess_read(struct kvm_vcpu *vcpu, struct kvm_io_device > *dev, > +static int vgic_uaccess_read(struct kvm_vcpu *vcpu, struct vgic_io_device > *iodev, >gpa_t addr, u32 *val) > { > - struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev); > const struct vgic_register_region *region; > struct kvm_vcpu *r_vcpu; > > @@ -960,10 +959,9 @@ static int vgic_uaccess_read(struct kvm_vcpu *vcpu, > struct kvm_io_device *dev, > return 0; > } > > -static int vgic_uaccess_write(struct kvm_vcpu *vcpu, struct kvm_io_device > *dev, > +static int vgic_uaccess_write(struct kvm_vcpu *vcpu, struct vgic_io_device > *iodev, > gpa_t addr, const u32 *val) > { > - struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev); > const struct vgic_register_region *region; > struct kvm_vcpu *r_vcpu; > > @@ -986,9 +984,9 @@ int vgic_uaccess(struct kvm_vcpu *vcpu, struct > vgic_io_device *dev, >bool is_write, int offset, u32 *val) > { > if (is_write) > - return vgic_uaccess_write(vcpu, &dev->dev, offset, val); > + return vgic_uaccess_write(vcpu, dev, offset, val); > else > - return vgic_uaccess_read(vcpu, &dev->dev, offset, val); > + return vgic_uaccess_read(vcpu, dev, offset, val); > } > > static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device > *dev, ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 6/9] docs: kvm: devices/arm-vgic-v3: enhance KVM_DEV_ARM_VGIC_CTRL_INIT doc
Hi Eric, On 12/12/20 6:50 PM, Eric Auger wrote: > kvm_arch_vcpu_precreate() returns -EBUSY if the vgic is > already initialized. So let's document that KVM_DEV_ARM_VGIC_CTRL_INIT > must be called after all vcpu creations. Checked and this is indeed the case, kvm_vm_ioctl_create_vcpu()->kvm_arch_vcpu_precreate() returns -EBUSY is vgic_initialized() is true. > > Signed-off-by: Eric Auger > --- > Documentation/virt/kvm/devices/arm-vgic-v3.rst | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst > b/Documentation/virt/kvm/devices/arm-vgic-v3.rst > index 5dd3bff51978..322de6aebdec 100644 > --- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst > +++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst > @@ -228,7 +228,7 @@ Groups: > > KVM_DEV_ARM_VGIC_CTRL_INIT >request the initialization of the VGIC, no additional parameter in > - kvm_device_attr.addr. > + kvm_device_attr.addr. Must be called after all vcpu creations. Nitpick here: the document writes VCPU with all caps. This also sounds a bit weird, I think something like "Must be called after all VCPUs have been created" is clearer. Thanks, Alex > KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES >save all LPI pending bits into guest RAM pending tables. > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 5/9] KVM: arm: move has_run_once after the map_resources
Hi Eric, On 12/12/20 6:50 PM, Eric Auger wrote: > has_run_once is set to true at the beginning of > kvm_vcpu_first_run_init(). This generally is not an issue > except when exercising the code with KVM selftests. Indeed, > if kvm_vgic_map_resources() fails due to erroneous user settings, > has_run_once is set and this prevents from continuing > executing the test. This patch moves the assignment after the > kvm_vgic_map_resources(). > > Signed-off-by: Eric Auger > --- > arch/arm64/kvm/arm.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index c0ffb019ca8b..331fae6bff31 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -540,8 +540,6 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu) > if (!kvm_arm_vcpu_is_finalized(vcpu)) > return -EPERM; > > - vcpu->arch.has_run_once = true; > - > if (likely(irqchip_in_kernel(kvm))) { > /* >* Map the VGIC hardware resources before running a vcpu the > @@ -560,6 +558,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu) > static_branch_inc(&userspace_irqchip_in_use); > } > > + vcpu->arch.has_run_once = true; I have a few concerns regarding this: 1. Moving has_run_once = true here seems very arbitrary to me - kvm_timer_enable() and kvm_arm_pmu_v3_enable(), below it, can both fail because of erroneous user values. If there's a reason why the assignment cannot be moved at the end of the function, I think it should be clearly stated in a comment for the people who might be tempted to write similar tests for the timer or pmu. 2. There are many ways that kvm_vgic_map_resources() can fail, other than incorrect user settings. I started digging into how kvm_vgic_map_resources()->vgic_v2_map_resources() can fail for a VGIC V2 and this is what I managed to find before I gave up: * vgic_init() can fail in: - kvm_vgic_dist_init() - vgic_v3_init() - kvm_vgic_setup_default_irq_routing() * vgic_register_dist_iodev() can fail in: - vgic_v3_init_dist_iodev() - kvm_io_bus_register_dev()(*) * kvm_phys_addr_ioremap() can fail in: - kvm_mmu_topup_memory_cache() - kvm_pgtable_stage2_map() So if any of the functions below fail, are we 100% sure it is safe to allow the user to execute kvm_vgic_map_resources() again? (*) It looks to me like kvm_io_bus_register_dev() doesn't take into account a caller that tries to register the same device address range and it will create another identical range. Is this intentional? Is it a bug that should be fixed? Or am I misunderstanding the function? Thanks, Alex > + > ret = kvm_timer_enable(vcpu); > if (ret) > return ret; ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/21] arm64: cpufeature: Add global feature override facility
On 2021-01-12 11:59, Suzuki K Poulose wrote: On 1/12/21 11:50 AM, Marc Zyngier wrote: Hi Suzuki, On 2021-01-12 09:17, Suzuki K Poulose wrote: Hi Marc, On 1/11/21 7:48 PM, Marc Zyngier wrote: [...] diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 894af60b9669..00d99e593b65 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -774,6 +774,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -781,9 +782,35 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + if (ftr_ovr < ftr_new) { Here we assume that all the features are FTR_LOWER_SAFE. We could probably use arm64_ftr_safe_value(ftrp, ftr_new, ftr_ovr) here ? That would cover us for both HIGHER_SAFE and LOWER_SAFE features. However that may be restrictive for FTR_EXACT, as we the safe value would be set to "ftr->safe_val". I guess that may be better than forcing to use an unsafe value for the boot CPU, which could anyway conflict with the other CPUs and eventually trigger the ftr alue to be safe_val. I like the idea of using the helper, as it cleanups up the code a bit. However, not being to set a feature to a certain value could be restrictive, as in general, it means that we can only disable a feature and not adjust its level of support. Take PMUVER for example: with the helper, I can't override it from v8.4 to v8.1. I can only go to v8.0. My point is, we set this only for the "init" of cpu features. So, even if we init to a custom , non-(default-safe) value, the secondary CPUs could scream, and the system wide safe value could fall back to the "safe" value for EXACT features, no matter what you did to init it. Right. So let's go with the safe value for EXACT features for now, and let the override fail if that's not what the user asked for. After all, there are only so many things we want to support as an override, and in all the cases at hand, using the safe value actually matches what we want to do. We can always revisit this if and when we need a different behaviour. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v2 15/26] of/fdt: Introduce early_init_dt_add_memory_hyp()
On Monday 11 Jan 2021 at 08:45:10 (-0600), Rob Herring wrote: > On Fri, Jan 8, 2021 at 6:16 AM Quentin Perret wrote: > > > > Introduce early_init_dt_add_memory_hyp() to allow KVM to conserve a copy > > of the memory regions parsed from DT. This will be needed in the context > > of the protected nVHE feature of KVM/arm64 where the code running at EL2 > > will be cleanly separated from the host kernel during boot, and will > > need its own representation of memory. > > What happened to doing this with memblock? I gave it a go, but as mentioned in v1, I ran into issues for nomap regions. I want the hypervisor to know about these memory regions (it's possible some of those will be given to protected guests for instance) but these seem to be entirely removed from the memblocks when using DT: https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L1153 EFI appears to do things differently, though, as it 'just' uses memblock_mark_nomap() instead of actively removing the memblock. And that means I could actually use the memblock API for EFI, but I'd rather have a common solution. I tried to understand why things are done differently but couldn't find an answer and kept things simple and working for now. Is there a good reason for not using memblock_mark_nomap() with DT? If not, I'm happy to try that. Thanks, Quentin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/21] arm64: cpufeature: Add global feature override facility
On 1/12/21 11:51 AM, Marc Zyngier wrote: On 2021-01-12 11:50, Marc Zyngier wrote: Hi Suzuki, On 2021-01-12 09:17, Suzuki K Poulose wrote: Hi Marc, On 1/11/21 7:48 PM, Marc Zyngier wrote: [...] diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 894af60b9669..00d99e593b65 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -774,6 +774,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -781,9 +782,35 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + if (ftr_ovr < ftr_new) { Here we assume that all the features are FTR_LOWER_SAFE. We could probably use arm64_ftr_safe_value(ftrp, ftr_new, ftr_ovr) here ? That would cover us for both HIGHER_SAFE and LOWER_SAFE features. However that may be restrictive for FTR_EXACT, as we the safe value would be set to "ftr->safe_val". I guess that may be better than forcing to use an unsafe value for the boot CPU, which could anyway conflict with the other CPUs and eventually trigger the ftr alue to be safe_val. I like the idea of using the helper, as it cleanups up the code a bit. However, not being to set a feature to a certain value could be restrictive, as in general, it means that we can only disable a feature and not adjust its level of support. Take PMUVER for example: with the helper, I can't override it from v8.4 to v8.1. I can only go to v8.0. Actually, we can only *disable* the PMU altogether. Same question though... It depends on two things : 1) What is the safe value for an EXACT typed feature ? Usually, that means either disabled, or the lowest possible value. 2) How is this value consumed ? a) i.e, Do we use the per-CPU value Then none of these changes have any effect b) System wide value ? Then we get the safe value as "influenced" by the infrastructure. The safe value we use for EXACT features is exclusively for making sure that the system uses the safe assumption and thus should be the best option. To answer your question, for PMU, it is 0, implies, v8.0. Or we could update the safe value to -1 (0xf) as the safe value, which is a bit more safer, kind of implying that the PMU is not a standard one. Cheers Suzuki M. Is it something we care about? Thanks, M. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 4/5] vfio/iommu_type1: Carefully use unmap_unpin_all during dirty tracking
On 2021/1/12 5:49, Alex Williamson wrote: > On Thu, 7 Jan 2021 17:29:00 +0800 > Keqian Zhu wrote: > >> If we detach group during dirty page tracking, we shouldn't remove >> vfio_dma, because dirty log will lose. >> >> But we don't prevent unmap_unpin_all in vfio_iommu_release, because >> under normal procedure, dirty tracking has been stopped. > > This looks like it's creating a larger problem than it's fixing, it's > not our job to maintain the dirty bitmap regardless of what the user > does. If the user detaches the last group in a container causing the > mappings within that container to be deconstructed before the user has > collected dirty pages, that sounds like a user error. A container with > no groups is de-privileged and therefore loses all state. Thanks, > > Alex Hi Alex, This looks good to me ;-). That's a reasonable constraint for user behavior. What about replacing this patch with an addition to the uapi document of VFIO_GROUP_UNSET_CONTAINER? User should pay attention to this when call this ioctl during dirty tracking. And any comments on other patches? thanks. Thanks. Keqian > >> Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages >> tracking") >> Signed-off-by: Keqian Zhu >> --- >> drivers/vfio/vfio_iommu_type1.c | 14 -- >> 1 file changed, 12 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c >> b/drivers/vfio/vfio_iommu_type1.c >> index 26b7eb2a5cfc..9776a059904d 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -2373,7 +2373,12 @@ static void vfio_iommu_type1_detach_group(void >> *iommu_data, >> if (list_empty(&iommu->external_domain->group_list)) { >> vfio_sanity_check_pfn_list(iommu); >> >> -if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) >> +/* >> + * During dirty page tracking, we can't remove >> + * vfio_dma because dirty log will lose. >> + */ >> +if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) && >> +!iommu->dirty_page_tracking) >> vfio_iommu_unmap_unpin_all(iommu); >> >> kfree(iommu->external_domain); >> @@ -2406,10 +2411,15 @@ static void vfio_iommu_type1_detach_group(void >> *iommu_data, >> * iommu and external domain doesn't exist, then all the >> * mappings go away too. If it's the last domain with iommu and >> * external domain exist, update accounting >> + * >> + * Note: During dirty page tracking, we can't remove vfio_dma >> + * because dirty log will lose. Just update accounting is a good >> + * choice. >> */ >> if (list_empty(&domain->group_list)) { >> if (list_is_singular(&iommu->domain_list)) { >> -if (!iommu->external_domain) >> +if (!iommu->external_domain && >> +!iommu->dirty_page_tracking) >> vfio_iommu_unmap_unpin_all(iommu); >> else >> vfio_iommu_unmap_unpin_reaccount(iommu); > > . > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/21] arm64: cpufeature: Add global feature override facility
On 1/12/21 11:50 AM, Marc Zyngier wrote: Hi Suzuki, On 2021-01-12 09:17, Suzuki K Poulose wrote: Hi Marc, On 1/11/21 7:48 PM, Marc Zyngier wrote: [...] diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 894af60b9669..00d99e593b65 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -774,6 +774,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -781,9 +782,35 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + if (ftr_ovr < ftr_new) { Here we assume that all the features are FTR_LOWER_SAFE. We could probably use arm64_ftr_safe_value(ftrp, ftr_new, ftr_ovr) here ? That would cover us for both HIGHER_SAFE and LOWER_SAFE features. However that may be restrictive for FTR_EXACT, as we the safe value would be set to "ftr->safe_val". I guess that may be better than forcing to use an unsafe value for the boot CPU, which could anyway conflict with the other CPUs and eventually trigger the ftr alue to be safe_val. I like the idea of using the helper, as it cleanups up the code a bit. However, not being to set a feature to a certain value could be restrictive, as in general, it means that we can only disable a feature and not adjust its level of support. Take PMUVER for example: with the helper, I can't override it from v8.4 to v8.1. I can only go to v8.0. My point is, we set this only for the "init" of cpu features. So, even if we init to a custom , non-(default-safe) value, the secondary CPUs could scream, and the system wide safe value could fall back to the "safe" value for EXACT features, no matter what you did to init it. Also, it will be dangerous for things like PAC, as the lower level value may not be compatible with the "actual" cpu feature supported. So simply setting to a lower value for EXACT features is generally not safe, though I understand some are exceptions. Suzuki Is it something we care about? Thanks, M. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/21] arm64: cpufeature: Add global feature override facility
On 2021-01-12 11:50, Marc Zyngier wrote: Hi Suzuki, On 2021-01-12 09:17, Suzuki K Poulose wrote: Hi Marc, On 1/11/21 7:48 PM, Marc Zyngier wrote: [...] diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 894af60b9669..00d99e593b65 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -774,6 +774,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -781,9 +782,35 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + if (ftr_ovr < ftr_new) { Here we assume that all the features are FTR_LOWER_SAFE. We could probably use arm64_ftr_safe_value(ftrp, ftr_new, ftr_ovr) here ? That would cover us for both HIGHER_SAFE and LOWER_SAFE features. However that may be restrictive for FTR_EXACT, as we the safe value would be set to "ftr->safe_val". I guess that may be better than forcing to use an unsafe value for the boot CPU, which could anyway conflict with the other CPUs and eventually trigger the ftr alue to be safe_val. I like the idea of using the helper, as it cleanups up the code a bit. However, not being to set a feature to a certain value could be restrictive, as in general, it means that we can only disable a feature and not adjust its level of support. Take PMUVER for example: with the helper, I can't override it from v8.4 to v8.1. I can only go to v8.0. Actually, we can only *disable* the PMU altogether. Same question though... M. Is it something we care about? Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/21] arm64: cpufeature: Add global feature override facility
Hi Suzuki, On 2021-01-12 09:17, Suzuki K Poulose wrote: Hi Marc, On 1/11/21 7:48 PM, Marc Zyngier wrote: [...] diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 894af60b9669..00d99e593b65 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -774,6 +774,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -781,9 +782,35 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + if (ftr_ovr < ftr_new) { Here we assume that all the features are FTR_LOWER_SAFE. We could probably use arm64_ftr_safe_value(ftrp, ftr_new, ftr_ovr) here ? That would cover us for both HIGHER_SAFE and LOWER_SAFE features. However that may be restrictive for FTR_EXACT, as we the safe value would be set to "ftr->safe_val". I guess that may be better than forcing to use an unsafe value for the boot CPU, which could anyway conflict with the other CPUs and eventually trigger the ftr alue to be safe_val. I like the idea of using the helper, as it cleanups up the code a bit. However, not being to set a feature to a certain value could be restrictive, as in general, it means that we can only disable a feature and not adjust its level of support. Take PMUVER for example: with the helper, I can't override it from v8.4 to v8.1. I can only go to v8.0. Is it something we care about? Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/21] arm64: cpufeature: Add global feature override facility
Hi Marc, On 1/11/21 7:48 PM, Marc Zyngier wrote: Hi Catalin, On 2021-01-11 18:41, Catalin Marinas wrote: Hi Marc, On Mon, Jan 11, 2021 at 01:27:59PM +, Marc Zyngier wrote: Add a facility to globally override a feature, no matter what the HW says. Yes, this is dangerous. Yeah, it's dangerous. We can make it less so if we only allow safe values (e.g. lower if FTR_UNSIGNED). My plan was also to allow non-safe values in order to trigger features that are not advertised by the HW. But I can understand if you are reluctant to allow such thing! :D diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 9a555809b89c..465d2cb63bfc 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -75,6 +75,8 @@ struct arm64_ftr_reg { u64 sys_val; u64 user_val; const struct arm64_ftr_bits *ftr_bits; + u64 *override_val; + u64 *override_mask; }; At the arm64_ftr_reg level, we don't have any information about the safe values for a feature. Could we instead move this to arm64_ftr_bits? We probably only need a single field. When populating the feature values, we can make sure it doesn't go above the hardware one. I attempted a feature modification for MTE here, though I dropped the entire series in the meantime as we clarified the ARM ARM: https://lore.kernel.org/linux-arm-kernel/20200515171612.1020-24-catalin.mari...@arm.com/ Srinivas copied it in his patch (but forgot to give credit ;)): https://lore.kernel.org/linux-arm-msm/1610152163-16554-3-git-send-email-sram...@codeaurora.org/ The above adds a filter function but, instead, just use your mechanism in this series for idreg.feature setting via cmdline. The arm64_ftr_value() function extracts the hardware value and lowers it if a cmdline argument was passed. One thing is that it is not always possible to sanitise the value passed if it is required very early on, as I do with VHE. But in that case I actually check that we are VHE capable before starting to poke at VHE-specific state. I came up with the following patch on top, which preserves the current global approach (no per arm64_ftr_bits state), but checks (and alters) the override as it iterates through the various fields. For example, if I pass "arm64.nopauth kvm-arm.mode=nvhe id_aa64pfr1.bt=5" to the FVP, I get the following output: [ 0.00] CPU features: SYS_ID_AA64ISAR1_EL1[31:28]: forced from 1 to 0 [ 0.00] CPU features: SYS_ID_AA64ISAR1_EL1[11:8]: forced from 1 to 0 [ 0.00] CPU features: SYS_ID_AA64MMFR1_EL1[11:8]: forced from 1 to 0 [ 0.00] CPU features: SYS_ID_AA64PFR1_EL1[3:0]: not forcing 1 to 5 [ 0.00] CPU features: detected: GIC system register CPU interface [ 0.00] CPU features: detected: Hardware dirty bit management [ 0.00] CPU features: detected: Spectre-v4 [ 0.00] CPU features: detected: Branch Target Identification showing that the PAC features have been downgraded, together with VHE, but that BTI is still detected as value 5 was obviously bogus. Thoughts? M. diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 894af60b9669..00d99e593b65 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -774,6 +774,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -781,9 +782,35 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + if (ftr_ovr < ftr_new) { Here we assume that all the features are FTR_LOWER_SAFE. We could probably use arm64_ftr_safe_value(ftrp, ftr_new, ftr_ovr) here ? That would cover us for both HIGHER_SAFE and LOWER_SAFE features. However that may be restrictive for FTR_EXACT, as we the safe value would be set to "ftr->safe_val". I guess that may be better than forcing to use an unsafe value for the boot CPU, which could anyway conflict with the other CPUs and eventually trigger the ftr alue to be safe_val. i.e, ftr_val = arm64_ftr_safe_value(ftrp, ftr_ovr, ftr_new); Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 00/66] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
On Mon, 11 Jan 2021 at 16:59, Marc Zyngier wrote: > > Hi Haibo, > > On 2021-01-11 07:20, Haibo Xu wrote: > > On Fri, 11 Dec 2020 at 00:00, Marc Zyngier wrote: > >> > >> This is a rework of the NV series that I posted 10 months ago[1], as a > >> lot of the KVM code has changed since, and the series apply anymore > >> (not that anybody really cares as the the HW is, as usual, made of > >> unobtainium...). > >> > >> From the previous version: > >> > >> - Integration with the new page-table code > >> - New exception injection code > >> - No more messing with the nVHE code > >> - No AArch32 > >> - Rebased on v5.10-rc4 + kvmarm/next for 5.11 > >> > >> From a functionality perspective, you can expect a L2 guest to work, > >> but don't even think of L3, as we only partially emulate the > >> ARMv8.{3,4}-NV extensions themselves. Same thing for vgic, debug, PMU, > >> as well as anything that would require a Stage-1 PTW. What we want to > >> achieve is that with NV disabled, there is no performance overhead and > >> no regression. > >> > >> The series is roughly divided in 5 parts: exception handling, memory > >> virtualization, interrupts and timers for ARMv8.3, followed by the > >> ARMv8.4 support. There are of course some dependencies, but you'll > >> hopefully get the gist of it. > >> > >> For the most courageous of you, I've put out a branch[2]. Of course, > >> you'll need some userspace. Andre maintains a hacked version of > >> kvmtool[3] that takes a --nested option, allowing the guest to be > >> started at EL2. You can run the whole stack in the Foundation > >> model. Don't be in a hurry ;-). > >> > > > > Hi Marc, > > > > I got a kernel BUG message when booting the L2 guest kernel with the > > kvmtool on a FVP setup. > > Could you help have a look about the BUG message as well as my > > environment configuration? > > I think It probably caused by some local configurations of the FVP > > setup. > > No, this is likely a bug in your L1 guest, which was fixed in -rc3: > > 2a5f1b67ec57 ("KVM: arm64: Don't access PMCR_EL0 when no PMU is > available") > > and was found in the exact same circumstances. Alternatively, and if > you don't want to change your L1 guest, you can just pass the --pmu > option to kvmtool when starting the L1 guest. After passing --pmu when starting a L1 guest, I can successfully run a L2 guest now! Thanks so much for the help! Haibo > > Hope this helps, > > M. > -- > Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm